]> git.ipfire.org Git - thirdparty/gcc.git/log
thirdparty/gcc.git
12 months ago[PR rtl-optimization/115877][2/n] Improve liveness computation for constant initializ...
Jeff Law [Sun, 21 Jul 2024 14:41:28 +0000 (08:41 -0600)] 
[PR rtl-optimization/115877][2/n] Improve liveness computation for constant initialization

While debugging pr115877, I noticed we were failing to remove the destination
register from LIVENOW bitmap when it was set to a constant value.  ie  (set
(dest) (const_int)).  This was a trivial oversight in
safe_for_live_propagation.

I don't have an example of this affecting code generation, but it certainly
could.  More importantly, by making LIVENOW more accurate it's easier to debug
when LIVENOW differs from expectations.

As with the prior patch this has been tested as part of a larger patchset with
the crosses as well as individually on x86_64.

Pushing to the trunk,

PR rtl-optimization/115877
gcc/
* ext-dce.cc (safe_for_live_propagation): Handle RTX_CONST_OBJ.

12 months ago[PR rtl-optimization/115877] Fix livein computation for ext-dce
Jeff Law [Sun, 21 Jul 2024 13:36:37 +0000 (07:36 -0600)] 
[PR rtl-optimization/115877] Fix livein computation for ext-dce

So I'm not yet sure how I'm going to break everything down, but this is easy
enough to break out as 1/N of ext-dce fixes/improvements.

When handling uses in an insn, we first determine what bits are set in the
destination which is represented in DST_MASK.  Then we use that to refine what
bits are live in the source operands.

In the source operand handling section we *modify* DST_MASK if the source
operand is a SUBREG (ugh!).  So if the first operand is a SUBREG, then we can
incorrectly compute which bit groups are live in the second operand, especially
if it is a SUBREG as well.

This was seen when testing a larger set of patches on the rl78 port
(builtin-arith-overflow-p-7 & pr71631 execution failures), so no new test for
this bugfix.

Run through my tester (in conjunction with other ext-dce changes) on the
various cross targets.  Run individually through a bootstrap and regression
test cycle on x86_64 as well.

Pushing to the trunk.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_uses): Restore the value of DST_MASK
for reach operand.

12 months agogcc: stop adding -fno-common for checking builds
Sam James [Sat, 20 Jul 2024 00:21:59 +0000 (01:21 +0100)] 
gcc: stop adding -fno-common for checking builds

Originally added in r0-44646-g204250d2fcd084 and r0-44627-gfd350d241fecf6 whic
moved -fno-common from all builds to just checking builds.

Since r10-4867-g6271dd984d7f92, GCC defaults to -fno-common. There's no need
to pass it specially for checking builds.

We could keep it for older bootstrap compilers with checking but I don't see
much value in that, it was already just a bonus before.

gcc/ChangeLog:
* Makefile.in (NOCOMMON_FLAG): Delete.
(GCC_WARN_CFLAGS): Drop NOCOMMON_FLAG.
(GCC_WARN_CXXFLAGS): Drop NOCOMMON_FLAG.
* configure.ac: Ditto.
* configure: Regenerate.

gcc/d/ChangeLog:
* Make-lang.in (WARN_DFLAGS): Drop NOCOMMON_FLAG.

12 months agoSH: Fix outage caused by recently added 2nd combine pass after reg alloc
Oleg Endo [Sun, 21 Jul 2024 05:11:21 +0000 (14:11 +0900)] 
SH: Fix outage caused by recently added 2nd combine pass after reg alloc

I've also confirmed on the CSiBE set that the secondary combine pass is
actually beneficial on SH.  It does result in some code size reductions.

gcc/CHangeLog:
* config/sh/sh.md (mov_neg_si_t): Allow insn and split after
register allocation.
(*treg_noop_move): New insn.

12 months agoDaily bump.
GCC Administrator [Sun, 21 Jul 2024 00:17:52 +0000 (00:17 +0000)] 
Daily bump.

12 months agoRequire bitint575 for pr116003.c
Andrew MacLeod [Sat, 20 Jul 2024 16:49:39 +0000 (12:49 -0400)] 
Require bitint575 for pr116003.c

Require a bitint target large enough.

gcc/testsuite/
* gcc.dg/pr116003.c: Require bitint575 target.

12 months agoRevert "Add documentation for musttail attribute"
Andi Kleen [Sat, 20 Jul 2024 23:09:41 +0000 (16:09 -0700)] 
Revert "Add documentation for musttail attribute"

This reverts commit 56f824cc206ff00d466aaeb11211d8005c4668bc.

12 months agoRevert "Add tests for C/C++ musttail attributes"
Andi Kleen [Sat, 20 Jul 2024 23:09:25 +0000 (16:09 -0700)] 
Revert "Add tests for C/C++ musttail attributes"

This reverts commit 37c4703ce84722b9c24db3e8e6d57ab6d3a7b5eb.

12 months agoRevert "C: Implement musttail attribute for returns"
Andi Kleen [Sat, 20 Jul 2024 23:09:07 +0000 (16:09 -0700)] 
Revert "C: Implement musttail attribute for returns"

This reverts commit 7db47f7b915c5f5d645fa536547e26b92290afe3.

12 months agoRevert "C++: Support clang compatible [[musttail]] (PR83324)"
Andi Kleen [Sat, 20 Jul 2024 23:07:41 +0000 (16:07 -0700)] 
Revert "C++: Support clang compatible [[musttail]] (PR83324)"

This reverts commit 59dd1d7ab21ad9a7ebf641ec9aeea609c003ad2f.

12 months agoOutput CodeView function information
Mark Harmstone [Thu, 27 Jun 2024 23:36:14 +0000 (00:36 +0100)] 
Output CodeView function information

Translate DW_TAG_subprogram DIEs into CodeView LF_FUNC_ID types and
S_GPROC32_ID / S_LPROC32_ID symbols.  ld will then transform these into
S_GPROC32 / S_LPROC32 symbols, which map addresses to unmangled function
names.

gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add new values.
(struct codeview_symbol): Add function to union.
(struct codeview_custom_type): Add lf_func_id to union.
(write_function): New function.
(write_codeview_symbols): Call write_function.
(write_lf_func_id): New function.
(write_custom_types): Call write_lf_func_id.
(add_function): New function.
(codeview_debug_early_finish): Call add_function.

12 months agoAdd bitint to options for testcase
Andrew MacLeod [Sat, 20 Jul 2024 15:45:16 +0000 (11:45 -0400)] 
Add bitint to options for testcase

Testcase should only be for bitint targets

gcc/testsuite/
* gcc.dg/pr116003.c : Add target bitint.

12 months agodoc: Remove documentation of two obsolete spec strings
André Maroneze [Sat, 20 Jul 2024 14:42:47 +0000 (16:42 +0200)] 
doc: Remove documentation of two obsolete spec strings

gcc:
* doc/invoke.texi (Spec Files): Remove documentation of obsolete
spec strings "predefines" and "signed_char".

12 months agoAvoid undefined behaviour in build_option_suggestions
Siddhesh Poyarekar [Fri, 19 Jul 2024 16:44:32 +0000 (12:44 -0400)] 
Avoid undefined behaviour in build_option_suggestions

The inner loop in build_option_suggestions uses OPTION to take the
address of OPTB and use it across iterations, which is undefined
behaviour since OPTB is defined within the loop.  Pull it outside the
loop to make this defined.

gcc/ChangeLog:

* opt-suggestions.cc
(option_proposer::build_option_suggestions): Pull OPTB
definition out of the innermost loop.

12 months agoAdd documentation for musttail attribute
Andi Kleen [Wed, 24 Jan 2024 07:38:23 +0000 (23:38 -0800)] 
Add documentation for musttail attribute

gcc/ChangeLog:

PR c/83324
* doc/extend.texi: Document [[musttail]]

12 months agoAdd tests for C/C++ musttail attributes
Andi Kleen [Wed, 24 Jan 2024 07:54:56 +0000 (23:54 -0800)] 
Add tests for C/C++ musttail attributes

Some adopted from the existing C musttail plugin tests.
Also extends the ability to query the sibcall capabilities of the
target.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp:
(check_effective_target_struct_tail_call): New function.
* c-c++-common/musttail1.c: New test.
* c-c++-common/musttail12.c: New test.
* c-c++-common/musttail13.c: New test.
* c-c++-common/musttail2.c: New test.
* c-c++-common/musttail3.c: New test.
* c-c++-common/musttail4.c: New test.
* c-c++-common/musttail5.c: New test.
* c-c++-common/musttail7.c: New test.
* c-c++-common/musttail8.c: New test.
* g++.dg/musttail10.C: New test.
* g++.dg/musttail11.C: New test.
* g++.dg/musttail6.C: New test.
* g++.dg/musttail9.C: New test.

12 months agoC: Implement musttail attribute for returns
Andi Kleen [Wed, 24 Jan 2024 15:44:23 +0000 (07:44 -0800)] 
C: Implement musttail attribute for returns

Implement a C23 clang compatible musttail attribute similar to the earlier
C++ implementation in the C parser.

gcc/c/ChangeLog:

PR c/83324
* c-parser.cc (struct attr_state): Define with musttail_p.
(c_parser_statement_after_labels): Handle [[musttail]].
(c_parser_std_attribute): Dito.
(c_parser_handle_musttail): Dito.
(c_parser_compound_statement_nostart): Dito.
(c_parser_all_labels): Dito.
(c_parser_statement): Dito.
* c-tree.h (c_finish_return): Add musttail_p flag.
* c-typeck.cc (c_finish_return): Handle musttail_p flag.

12 months agoC++: Support clang compatible [[musttail]] (PR83324)
Andi Kleen [Wed, 24 Jan 2024 07:44:48 +0000 (23:44 -0800)] 
C++: Support clang compatible [[musttail]] (PR83324)

This patch implements a clang compatible [[musttail]] attribute for
returns.

musttail is useful as an alternative to computed goto for interpreters.
With computed goto the interpreter function usually ends up very big
which causes problems with register allocation and other per function
optimizations not scaling. With musttail the interpreter can be instead
written as a sequence of smaller functions that call each other. To
avoid unbounded stack growth this requires forcing a sibling call, which
this attribute does. It guarantees an error if the call cannot be tail
called which allows the programmer to fix it instead of risking a stack
overflow. Unlike computed goto it is also type-safe.

It turns out that David Malcolm had already implemented middle/backend
support for a musttail attribute back in 2016, but it wasn't exposed
to any frontend other than a special plugin.

This patch adds a [[gnu::musttail]] attribute for C++ that can be added
to return statements. The return statement must be a direct call
(it does not follow dependencies), which is similar to what clang
implements. It then uses the existing must tail infrastructure.

For compatibility it also detects clang::musttail

Passes bootstrap and full test

gcc/c-family/ChangeLog:

* c-attribs.cc (set_musttail_on_return): New function.
* c-common.h (set_musttail_on_return): Declare new function.

gcc/cp/ChangeLog:

PR c/83324
* cp-tree.h (AGGR_INIT_EXPR_MUST_TAIL): Add.
* parser.cc (cp_parser_statement): Handle musttail.
(cp_parser_jump_statement): Dito.
* pt.cc (tsubst_expr): Copy CALL_EXPR_MUST_TAIL_CALL.
* semantics.cc (simplify_aggr_init_expr): Handle musttail.

12 months agoAdd a musttail generic attribute to the c-attribs table
Andi Kleen [Thu, 16 May 2024 02:38:43 +0000 (19:38 -0700)] 
Add a musttail generic attribute to the c-attribs table

The actual handling is directly in the parser since the
generic mechanism doesn't support statement attributes,
but this gives basic error checking/detection on the attribute.

gcc/c-family/ChangeLog:

PR c/83324
* c-attribs.cc (handle_musttail_attribute): Add.
* c-common.h (handle_musttail_attribute): Add.

12 months agoLoongArch: Organize the code related to split move and merge the same functions.
Lulu Cheng [Fri, 12 Jul 2024 01:57:40 +0000 (09:57 +0800)] 
LoongArch: Organize the code related to split move and merge the same functions.

gcc/ChangeLog:

* config/loongarch/loongarch-protos.h
(loongarch_split_128bit_move): Delete.
(loongarch_split_128bit_move_p): Delete.
(loongarch_split_256bit_move): Delete.
(loongarch_split_256bit_move_p): Delete.
(loongarch_split_vector_move): Add a function declaration.
* config/loongarch/loongarch.cc
(loongarch_vector_costs::finish_cost): Adjust the code
formatting.
(loongarch_split_vector_move_p): Merge
loongarch_split_128bit_move_p and loongarch_split_256bit_move_p.
(loongarch_split_move_p): Merge code.
(loongarch_split_move): Likewise.
(loongarch_split_128bit_move_p): Delete.
(loongarch_split_256bit_move_p): Delete.
(loongarch_split_128bit_move): Delete.
(loongarch_split_vector_move): Merge loongarch_split_128bit_move
and loongarch_split_256bit_move.
(loongarch_split_256bit_move): Delete.
(loongarch_global_init): Remove the extra semicolon at the
end of the function.
* config/loongarch/loongarch.md (*movdf_softfloat):  Added a new
condition TARGET_64BIT.

12 months agoDaily bump.
GCC Administrator [Sat, 20 Jul 2024 00:17:53 +0000 (00:17 +0000)] 
Daily bump.

12 months agoCheck for SSA_NAME not in the IL yet.
Andrew MacLeod [Fri, 19 Jul 2024 21:39:40 +0000 (17:39 -0400)] 
Check for SSA_NAME not in the IL yet.

Check for an SSA_NAME not in the CFG before trying to create an
equivalence record in the defintion block.

PR tree-optimization/116003
gcc/
* value-relation.cc (equiv_oracle::register_initial_def): Check
if SSA_NAME is in the IL before registering.

gcc/testsuite/
* gcc.dg/pr116003.c: New.

12 months agolibgomp: Document 'GOMP_teams4'
Thomas Schwinge [Tue, 16 Jul 2024 15:09:38 +0000 (17:09 +0200)] 
libgomp: Document 'GOMP_teams4'

For reference:

  - <https://inbox.sourceware.org/20211111190313.GV2710@tucnak> "[PATCH] openmp: Honor OpenMP 5.1 num_teams lower bound"
  - <https://inbox.sourceware.org/20211112132023.GC2710@tucnak> "[PATCH] libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound"

libgomp/
* config/gcn/target.c (GOMP_teams4): Document.
* config/nvptx/target.c (GOMP_teams4): Likewise.
* target.c (GOMP_teams4): Likewise.

12 months agoGCN: Honor OpenMP 5.1 'num_teams' lower bound
Thomas Schwinge [Mon, 15 Jul 2024 09:19:28 +0000 (11:19 +0200)] 
GCN: Honor OpenMP 5.1 'num_teams' lower bound

Corresponding to commit 9fa72756d90e0d9edadf6e6f5f56476029925788
"libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound", these are the
GCN offloading changes to fix:

    PASS: libgomp.c/../libgomp.c-c++-common/teams-2.c (test for excess errors)
    [-FAIL:-]{+PASS:+} libgomp.c/../libgomp.c-c++-common/teams-2.c execution test

    PASS: libgomp.c++/../libgomp.c-c++-common/teams-2.c (test for excess errors)
    [-FAIL:-]{+PASS:+} libgomp.c++/../libgomp.c-c++-common/teams-2.c execution test

..., and omptests' 't-critical' test case.  I've cross checked that those test
cases are the ones that regress for nvptx offloading, if I locally revert the
"libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound" changes.

libgomp/
* config/gcn/libgomp-gcn.h (GOMP_TEAM_NUM): Inject.
* config/gcn/target.c (GOMP_teams4): Handle.
* config/gcn/team.c (gomp_gcn_enter_kernel): Initialize.
* config/gcn/teams.c (omp_get_team_num): Adjust.

12 months agoRewrite usage comment at the top of 'gcc/passes.def'
Thomas Schwinge [Fri, 28 Jun 2024 12:05:04 +0000 (14:05 +0200)] 
Rewrite usage comment at the top of 'gcc/passes.def'

Since Subversion r201359 (Git commit a167b052dfe9a8509bb23c374ffaeee953df0917)
"Introduce gen-pass-instances.awk and pass-instances.def", the usage comment at
the top of 'gcc/passes.def' no longer is accurate (even if that latter file
does continue to use the 'NEXT_PASS' form without 'NUM') -- and, worse, the
'NEXT_PASS' etc. in that usage comment are processed by the
'gcc/gen-pass-instances.awk' script:

    --- source-gcc/gcc/passes.def   2024-06-24 18:55:15.132561641 +0200
    +++ build-gcc/gcc/pass-instances.def    2024-06-24 18:55:27.768562714 +0200
    [...]
    @@ -20,546 +22,578 @@
     /*
      Macros that should be defined when using this file:
        INSERT_PASSES_AFTER (PASS)
        PUSH_INSERT_PASSES_WITHIN (PASS)
        POP_INSERT_PASSES ()
    -   NEXT_PASS (PASS)
    +   NEXT_PASS (PASS, 1)
        TERMINATE_PASS_LIST (PASS)
      */
    [...]

(That is, this is 'NEXT_PASS' for the first instance of pass 'PASS'.)
That's benign so far, but with another thing that I'll be extending, I'd
then run into an error while the script handles this comment block.  ;-\

gcc/
* passes.def: Rewrite usage comment at the top.

12 months agoTreat boolean vector elements as 0/-1 [PR115406]
Richard Sandiford [Fri, 19 Jul 2024 18:09:37 +0000 (19:09 +0100)] 
Treat boolean vector elements as 0/-1 [PR115406]

Previously we built vector boolean constants using 1 for true
elements and 0 for false elements.  This matches the predicates
produced by SVE's PTRUE instruction, but leads to a miscompilation
on AVX, where all bits of a boolean element should be set.

One option for RTL would be to make this target-configurable.
But that isn't really possible at the tree level, where vectors
should work in a more target-independent way.  (There is currently
no way to create a "generic" packed boolean vector, but never say
never :))  And, if we were going to pick a generic behaviour,
it would make sense to use 0/-1 rather than 0/1, for consistency
with integer vectors.

Both behaviours should work with SVE on read, since SVE ignores
the upper bits in each predicate element.  And the choice shouldn't
make much difference for RTL, since all SVE predicate modes are
expressed as vectors of BI, rather than of multi-bit booleans.

I suspect there might be some fallout from this change on SVE.
But I think we should at least give it a go, and see whether any
fallout provides a strong counterargument against the approach.

gcc/
PR middle-end/115406
* fold-const.cc (native_encode_vector_part): For vector booleans,
check whether an element is nonzero and, if so, set all of the
correspending bits in the target image.
* simplify-rtx.cc (native_encode_rtx): Likewise.

gcc/testsuite/
PR middle-end/115406
* gcc.dg/torture/pr115406.c: New test.

12 months agoarm: Update fp16-aapcs-[24].c after insn_propagation patch
Richard Sandiford [Fri, 19 Jul 2024 18:09:37 +0000 (19:09 +0100)] 
arm: Update fp16-aapcs-[24].c after insn_propagation patch

These tests used to generate:

        bl      swap
        ldr     r2, [sp, #4]
        mov     r0, r2  @ __fp16

but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
load directly into r0:

        bl      swap
        ldrh    r0, [sp, #4]    @ __fp16

This patch updates the tests to "defend" this change.

While there, the scans include:

mov\tr1, r[03]}

But if the spill of r2 occurs first, there's no real reason why
r2 couldn't be used as the temporary, instead r3.

The patch tries to update the scans while preserving the spirit
of the originals.

gcc/testsuite/
* gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be
loaded directly from the stack.  Test that the swap generates
two moves out of r0/r1 and two moves in.
* gcc.target/arm/fp16-aapcs-4.c: Likewise.

12 months agoc++: xobj fn call without obj [PR115783]
Patrick Palka [Fri, 19 Jul 2024 17:48:12 +0000 (13:48 -0400)] 
c++: xobj fn call without obj [PR115783]

The code path for rejecting an object-less call to a non-static member
function should also consider xobj member functions (so that we correctly
reject the below calls with a "cannot call member function without object"
diagnostic).

PR c++/115783

gcc/cp/ChangeLog:

* call.cc (build_new_method_call): Generalize METHOD_TYPE
check to DECL_OBJECT_MEMBER_FUNCTION_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-diagnostics11.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
12 months agoAVR: Support new built-in function __builtin_avr_mask1.
Georg-Johann Lay [Fri, 19 Jul 2024 16:22:26 +0000 (18:22 +0200)] 
AVR: Support new built-in function __builtin_avr_mask1.

gcc/
* config/avr/builtins.def (MASK1): New DEF_BUILTIN.
* config/avr/avr.cc (avr_rtx_costs_1): Handle rtx costs for
expressions like __builtin_avr_mask1.
(avr_init_builtins) <uintQI_ftype_uintQI_uintQI>: New tree type.
(avr_expand_builtin) [AVR_BUILTIN_MASK1]: Diagnose unexpected forms.
(avr_fold_builtin) [AVR_BUILTIN_MASK1]: Handle case.
* config/avr/avr.md (gen_mask1): New expand helper.
(mask1_0x01_split, mask1_0x80_split, mask1_0xfe_split): New
insn-and-split.
(*mask1_0x01, *mask1_0x80, *mask1_0xfe): New insns.
* doc/extend.texi (AVR Built-in Functions) <__builtin_avr_mask1>:
Document new built-in function.
gcc/testsuite/
* gcc.target/avr/torture/builtin-mask1.c: New test.

12 months agolibgomp: Remove bogus warnings from privatized-ref-2.f90.
Paul Thomas [Fri, 19 Jul 2024 15:58:33 +0000 (16:58 +0100)] 
libgomp: Remove bogus warnings from privatized-ref-2.f90.

2024-07-19  Paul Thomas  <pault@gcc.gnu.org>

libgomp/ChangeLog

* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Cut
dg-note about 'a' and remove bogus warnings about its array
descriptor components being used uninitialized.

12 months agoFortran: character array constructor with >= 4 constant elements [PR103115]
Harald Anlauf [Thu, 18 Jul 2024 19:15:48 +0000 (21:15 +0200)] 
Fortran: character array constructor with >= 4 constant elements [PR103115]

gcc/fortran/ChangeLog:

PR fortran/103115
* trans-array.cc (gfc_trans_array_constructor_value): If the first
element of an array constructor is deferred-length character and
therefore does not have an element size known at compile time, do
not try to collect subsequent constant elements into a constructor
for optimization.

gcc/testsuite/ChangeLog:

PR fortran/103115
* gfortran.dg/string_array_constructor_4.f90: New test.

12 months agors6000: Catch unsupported ABI errors when using -mrop-protect [PR114759,PR115988]
Peter Bergner [Thu, 18 Jul 2024 23:01:46 +0000 (18:01 -0500)] 
rs6000: Catch unsupported ABI errors when using -mrop-protect [PR114759,PR115988]

2024-07-18  Peter Bergner  <bergner@linux.ibm.com>

gcc/testsuite/
PR target/114759
PR target/115988
* gcc.target/powerpc/pr114759-3.c: Catch unsupported ABI errors.

12 months agoc++: add fixed testcase [PR109464]
Patrick Palka [Fri, 19 Jul 2024 15:08:09 +0000 (11:08 -0400)] 
c++: add fixed testcase [PR109464]

Seems to be fixed by r15-521-g6ad7ca1bb90573.

PR c++/109464

gcc/testsuite/ChangeLog:

* g++.dg/template/explicit-instantiation8.C: New test.

12 months agobpf: create modifier for mem operand for xchg and cmpxchg
Cupertino Miranda [Thu, 11 Jul 2024 14:28:09 +0000 (15:28 +0100)] 
bpf: create modifier for mem operand for xchg and cmpxchg

Both xchg and cmpxchg instructions, in the pseudo-C dialect, do not
expect their memory address operand to be surrounded by parentheses.
For example, it should be output as "w0 =cmpxchg32_32(r8+8,w0,w2)"
instead of "w0 =cmpxchg32_32((r8+8),w0,w2)".

This patch implements an operand modifier 'M' which marks the
instruction templates that do not expect the parentheses, and adds it do
xchg and cmpxchg templates.

gcc/ChangeLog:
* config/bpf/atomic.md (atomic_compare_and_swap,
atomic_exchange): Add operand modifier %M to the first
operand.
* config/bpf/bpf.cc (no_parentheses_mem_operand): Create
variable.
(bpf_print_operand): Set no_parentheses_mem_operand variable if
%M operand is used.
(bpf_print_operand_address): Conditionally output parentheses.

gcc/testsuite/ChangeLog:
* gcc.target/bpf/pseudoc-atomic-memaddr-op.c: Add test.

12 months agoc++: Add [dcl.init.aggr] examples to testsuite
Jakub Jelinek [Fri, 19 Jul 2024 06:53:47 +0000 (08:53 +0200)] 
c++: Add [dcl.init.aggr] examples to testsuite

When working on the #embed optimization support, I went recently through
all of reshape_init_r* and today I read in detail all the P3106R1 changes
and I believe we implement it that way for years.
To double check that, I've added tests with the current [dcl.init.aggr]
examples but tested in all the languages from C++98 to C++26, of course
guarded as needed for constructs which require newer versions of C++.
The examples come in two tests, one is a runtime test for the non-erroneous
examples, the other is a compile time test for the diagnostics.
The former one includes mostly intact examples with runtime checking (both
to test what is written in the section exactly and to test at least
something with C++98) and then when useful also adds constexpr tests with
static_asserts for C++11 and later.

Tested on x86_64-linux and i686-linux with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26 make check-g++ RUNTESTFLAGS='dg.exp=aggr-init*.C'

Also tested on GCC 11 branch with
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ RUNTESTFLAGS='dg.exp=aggr-init*.C'
where just the " is a GCC extension" part of one error is left out,
otherwise it passes the same, ditto with clang 14 (of course with different
diagnostics, but verified it emits diagnostics on the right lines), so I
believe we can claim implementation of this DR paper, either in all versions
or at least in GCC 11+.

2024-07-19  Jakub Jelinek  <jakub@redhat.com>

PR c++/114460
* g++.dg/cpp26/aggr-init1.C: New test.
* g++.dg/cpp26/aggr-init2.C: New test.

12 months agoClose GCC 11 branch
Richard Biener [Fri, 19 Jul 2024 05:58:28 +0000 (07:58 +0200)] 
Close GCC 11 branch

Remove gcc-11 branch from updating and snapshot generating

contrib/
* gcc-changelog/git_update_version.py: Remove gcc-11 branch.

maintainer-scripts/
* crontab: Remove entry for gcc-11 branch.

12 months agoc++: Hash placeholder constraint in ctp_hasher
Seyed Sajad Kahani [Thu, 18 Jul 2024 15:01:32 +0000 (16:01 +0100)] 
c++: Hash placeholder constraint in ctp_hasher

This patch addresses a difference between the hash function and the equality
function for canonical types of template parameters (ctp_hasher). The equality
function uses comptypes (typeck.cc) (with COMPARE_STRUCTURAL) and checks
constraint equality for two auto nodes (typeck.cc:1586), while the hash
function ignores it (pt.cc:4528). This leads to hash collisions that can be
avoided by using `hash_placeholder_constraint` (constraint.cc:1150).

Note that due to the proper handling of hash collisions (hash-table.h:1059),
there is no test case that can distinguish the current implementation from the
proposed one.

* constraint.cc (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Rename from
hash_placeholder_constraint and add the initial val argument.
* cp-tree.h (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Renamed from
hash_placeholder_constraint and add the initial val argument.
* pt.cc (struct ctp_hasher): Updated to use
iterative_hash_placeholder_constraint in the case of a valid placeholder
constraint.
(auto_hash::hash): Reflect the renaming of hash_placeholder_constraint to
iterative_hash_placeholder_constraint.

12 months agoMatch: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]
Pan Li [Thu, 18 Jul 2024 12:16:34 +0000 (20:16 +0800)] 
Match: Only allow single use of MIN_EXPR for SAT_TRUNC form 2 [PR115863]

The SAT_TRUNC form 2 has below pattern matching.
From:
  _18 = MIN_EXPR <left_8, 4294967295>;
  iftmp.0_11 = (unsigned int) _18;

To:
  _18 = MIN_EXPR <left_8, 4294967295>;
  iftmp.0_11 = .SAT_TRUNC (left_8);

But if there is another use of _18 like below,  the transform to the
.SAT_TRUNC may have no earnings.  For example:

From:
  _18 = MIN_EXPR <left_8, 4294967295>; // op_0 def
  iftmp.0_11 = (unsigned int) _18;     // op_0
  stream.avail_out = iftmp.0_11;
  left_37 = left_8 - _18;              // op_0 use

To:
  _18 = MIN_EXPR <left_8, 4294967295>; // op_0 def
  iftmp.0_11 = .SAT_TRUNC (left_8);
  stream.avail_out = iftmp.0_11;
  left_37 = left_8 - _18;              // op_0 use

Pattern recog to .SAT_TRUNC cannot eliminate MIN_EXPR as above.  Then the
backend (for example x86/riscv) will have additional 2-3 more insns
after pattern recog besides the MIN_EXPR.  Thus,  keep the normal truncation
as is should be the better choose.

The below testsuites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

PR target/115863

gcc/ChangeLog:

* match.pd: Add single_use check for .SAT_TRUNC form 2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115863-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
12 months agoDaily bump.
GCC Administrator [Fri, 19 Jul 2024 00:18:20 +0000 (00:18 +0000)] 
Daily bump.

12 months agolibatomic: Handle AVX+CX16 ZHAOXIN like Intel for 16b atomic [PR104688]
mayshao [Thu, 18 Jul 2024 20:43:00 +0000 (22:43 +0200)] 
libatomic: Handle AVX+CX16 ZHAOXIN like Intel for 16b atomic [PR104688]

PR target/104688

libatomic/ChangeLog:

* config/x86/init.c (__libat_feat1_init): Don't clear
bit_AVX on ZHAOXIN CPUs.

12 months agoc++: implement DR1363 and DR1496 for __is_trivial [PR85723]
Marek Polacek [Tue, 18 Jun 2024 20:49:24 +0000 (16:49 -0400)] 
c++: implement DR1363 and DR1496 for __is_trivial [PR85723]

is_trivial was introduced in
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2230.html>
which split POD into is_trivial and is_standard_layout.

Later came CWG 1363.  Since

  struct A {
    A() = default;
    A(int = 42) {}
  };

cannot be default-initialized, it should not be trivial, so the definition
of what is a trivial class changed.

Similarly, CWG 1496 concluded that

  struct B {
    B() = delete;
  }:

should not be trivial either.

P0848 adjusted the definition further to say "eligible".  That means
that

  template<typename T>
  struct C {
    C() requires false = default;
  };

should not be trivial, either, since C::C() is not eligible.

Bug 85723 reports that we implement none of the CWGs.

I chose to fix this by using type_has_non_deleted_trivial_default_ctor
which uses locate_ctor which uses build_new_method_call, which would
be used by default-initialization as well.  With that, all __is_trivial
problems I could find in the Bugzilla are fixed, except for PR96288,
which may need changes to trivially-copyable, so I'm not messing with
that now.

I hope this has no ABI implications.  There's effort undergoing to
remove "trivial class" from the core language as it's not really
meaningful.  So the impact of this change should be pretty low except
to fix a few libstdc++ problems.

PR c++/108769
PR c++/58074
PR c++/115522
PR c++/85723

gcc/cp/ChangeLog:

* class.cc (type_has_non_deleted_trivial_default_ctor): Fix formatting.
* tree.cc (trivial_type_p): Instead of TYPE_HAS_TRIVIAL_DFLT, use
type_has_non_deleted_trivial_default_ctor.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wclass-memaccess.C: Add dg-warning.
* g++.dg/ext/is_trivial1.C: New test.
* g++.dg/ext/is_trivial2.C: New test.
* g++.dg/ext/is_trivial3.C: New test.
* g++.dg/ext/is_trivial4.C: New test.
* g++.dg/ext/is_trivial5.C: New test.
* g++.dg/ext/is_trivial6.C: New test.

12 months agolibbacktrace: use __has_attribute for fallthrough
Ian Lance Taylor [Thu, 18 Jul 2024 18:34:09 +0000 (11:34 -0700)] 
libbacktrace: use __has_attribute for fallthrough

Also convert some FALLTHROUGH comments to ATTRIBUTE_FALLTHROUGH.

* internal.h: Use __has_attribute to check for fallthrough
attribute.
* elf.c (elf_zstd_decompress): Use ATTRIBUTE_FALLTHROUGH rather
than a FALLTHROUGH comment.

12 months agors6000: Fix .machine cpu selection w/ altivec [PR97367]
René Rebe [Fri, 12 Jul 2024 21:17:08 +0000 (21:17 +0000)] 
rs6000: Fix .machine cpu selection w/ altivec [PR97367]

There are various non-IBM CPUs with altivec, so we cannot use that
flag to determine which .machine cpu to use, so ignore it.
Emit an additional ".machine altivec" if Altivec is enabled so
that the assembler doesn't require an explicit -maltivec option
to assemble any Altivec instructions for those targets where
the ".machine cpu" is insufficient to enable Altivec.  For example,
-mcpu=G5 emits a ".machine power4".

2024-07-18  René Rebe  <rene@exactcode.de>
    Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/97367
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Do not consider
OPTION_MASK_ALTIVEC.
(emit_asm_machine): For Altivec compiles, emit a ".machine altivec".

gcc/testsuite/
PR target/97367
* gcc.target/powerpc/pr97367.c: New test.

Signed-off-by: René Rebe <rene@exactcode.de>
12 months agors6000, update effective target for tests builtins-10*.c and vec_perm-runnable-i128.c
Carl Love [Fri, 12 Jul 2024 18:37:36 +0000 (13:37 -0500)] 
rs6000, update effective target for tests builtins-10*.c and vec_perm-runnable-i128.c

The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-10-runnable.c: Add
target int128.
* gcc.target/powerpc/builtins-10.c: Add
target int128.
* gcc.target/powerpc/vec_perm-runnable-i128.c: Add
target int128.

12 months agolibatomic: Improve cpuid usage in __libat_feat1_init
Uros Bizjak [Thu, 18 Jul 2024 14:58:09 +0000 (16:58 +0200)] 
libatomic: Improve cpuid usage in __libat_feat1_init

Check the result of __get_cpuid and process FEAT1_REGISTER only when
__get_cpuid returns success.  Use __cpuid instead of nested __get_cpuid.

libatomic/ChangeLog:

* config/x86/init.c (__libat_feat1_init): Check the result of
__get_cpuid and process FEAT1_REGISTER only when __get_cpuid
returns success.  Use __cpuid instead of nested __get_cpuid.

12 months agoeh: ICE with std::initializer_list and ASan [PR115865]
Marek Polacek [Thu, 11 Jul 2024 19:57:43 +0000 (15:57 -0400)] 
eh: ICE with std::initializer_list and ASan [PR115865]

Here we ICE with -fsanitize=address on

  std::initializer_list x = { 1, 2, 3 };

since r14-8681, which removed .ASAN_MARK calls on TREE_STATIC variables.
That means that lower_try_finally now instead of

  try
    {
      .ASAN_MARK (UNPOISON, &C.0, 12);
      x = {};
      x._M_len = 3;
      x._M_array = &C.0;
    }
  finally
    {
      .ASAN_MARK (POISON, &C.0, 12);
    }

gets:

  try
    {
      x = {};
      x._M_len = 3;
      x._M_array = &C.0;
    }
  finally
    {

    }

and we ICE on the empty finally in lower_try_finally_onedest while
getting get_eh_else.

PR c++/115865

gcc/ChangeLog:

* tree-eh.cc (get_eh_else): Check that the result of
gimple_seq_first_stmt is non-null.

gcc/testsuite/ChangeLog:

* g++.dg/asan/initlist2.C: New test.

Co-authored-by: Jakub Jelinek <jakub@redhat.com>
12 months agoDo not use caller-saved registers for COMDAT functions
LIU Hao [Mon, 15 Jul 2024 08:55:52 +0000 (16:55 +0800)] 
Do not use caller-saved registers for COMDAT functions

A reference to a COMDAT function may be resolved to another definition
outside the current translation unit, so it's not eligible for `-fipa-ra`.

In `decl_binds_to_current_def_p()` there is already a check for weak
symbols. This commit checks for COMDAT functions that are not implemented
as weak symbols, for example, on *-*-mingw32.

gcc/ChangeLog:

PR rtl-optimization/115049
* varasm.cc (decl_binds_to_current_def_p): Add a check for COMDAT
declarations too, like weak ones.

12 months agomiddle-end/115641 - invalid address construction
Richard Biener [Thu, 18 Jul 2024 11:35:33 +0000 (13:35 +0200)] 
middle-end/115641 - invalid address construction

fold_truth_andor_1 via make_bit_field_ref builds an address of
a CALL_EXPR which isn't valid GENERIC and later causes an ICE.
The following simply avoids the folding for f ().a != 1 || f ().b != 2
as it is a premature optimization anyway.  The alternative would
have been to build a TARGET_EXPR around the call.  To get this far
f () has to be const as otherwise the two calls are not semantically
equivalent for the optimization.

PR middle-end/115641
* fold-const.cc (decode_field_reference): If the inner
reference isn't something we can take the address of, fail.

* gcc.dg/torture/pr115641.c: New testcase.

12 months agoDoc: Add Standard-Names ustrunc and sstrunc for integer modes
Pan Li [Thu, 18 Jul 2024 03:30:38 +0000 (11:30 +0800)] 
Doc: Add Standard-Names ustrunc and sstrunc for integer modes

This patch would like to add the doc for the Standard-Names
ustrunc and sstrunc,  include both the scalar and vector integer
modes.

gcc/ChangeLog:

* doc/md.texi: Add Standard-Names ustrunc and sstrunc.

Signed-off-by: Pan Li <pan2.li@intel.com>
12 months agoFortran: Fix Explicit cobounds of a procedures parameter not respected [PR78466]
Andre Vehreschild [Thu, 31 Dec 2020 09:40:30 +0000 (10:40 +0100)] 
Fortran: Fix Explicit cobounds of a procedures parameter not respected [PR78466]

Explicit cobounds of class array procedure parameters were not taken
into account.  Furthermore were different cobounds in distinct
procedure parameter lists mixed up, i.e. the last definition was taken
for all.  The bounds are now regenerated when tree's and expr's bounds
do not match.

PR fortran/78466
PR fortran/80774

gcc/fortran/ChangeLog:

* array.cc (gfc_compare_array_spec): Take cotype into account.
* class.cc (gfc_build_class_symbol): Coarrays are also arrays.
* gfortran.h (IS_CLASS_COARRAY_OR_ARRAY): New macro to detect
regular and coarray class arrays.
* interface.cc (compare_components): Take codimension into
account.
* resolve.cc (resolve_symbol): Improve error message.
* simplify.cc (simplify_bound_dim): Remove duplicate.
* trans-array.cc (gfc_trans_array_cobounds): Coarrays are also
arrays.
(gfc_trans_array_bounds): Same.
(gfc_trans_dummy_array_bias): Same.
(get_coarray_as): Get the as having a non-zero codim.
(is_explicit_coarray): Detect explicit coarrays.
(gfc_conv_expr_descriptor): Create a new descriptor for explicit
coarrays.
* trans-decl.cc (gfc_build_qualified_array): Coarrays are also
arrays.
(gfc_build_dummy_array_decl): Same.
(gfc_get_symbol_decl): Same.
(gfc_trans_deferred_vars): Same.
* trans-expr.cc (class_scalar_coarray_to_class): Get the
descriptor from the correct location.
(gfc_conv_variable): Pick up the descriptor when needed.
* trans-types.cc (gfc_is_nodesc_array): Coarrays are also
arrays.
(gfc_get_nodesc_array_type): Indentation fix only.
(cobounds_match_decl): Match a tree's bounds to the expr's
bounds and return true, when they match.
(gfc_get_derived_type): Create a new type tree/descriptor, when
the cobounds of the existing declaration and expr to not
match.  This happends for class arrays in parameter list, when
there are different cobound declarations.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/poly_run_1.f90: Activate old test code.
* gfortran.dg/coarray/poly_run_2.f90: Activate test.  It was
stopping before and passing without an error.

12 months agotestsuite: Add dg-do run to more tests
Sam James [Thu, 18 Jul 2024 08:00:17 +0000 (10:00 +0200)] 
testsuite: Add dg-do run to more tests

All of these are for wrong-code bugs.  Confirmed to be used before but
with no execution.

2024-07-18  Sam James  <sam@gentoo.org>

PR c++/53288
PR c++/57437
PR c/65345
PR libstdc++/88101
PR tree-optimization/96369
PR tree-optimization/102124
PR tree-optimization/108692
* c-c++-common/pr96369.c: Add dg-do run directive.
* gcc.dg/torture/pr102124.c: Ditto.
* gcc.dg/pr108692.c: Ditto.
* gcc.dg/atomic/pr65345-4.c: Ditto.
* g++.dg/cpp0x/lambda/lambda-return1.C: Ditto.
* g++.dg/init/lifetime4.C: Ditto.
* g++.dg/torture/builtin-clear-padding-1.C: Ditto.
* g++.dg/torture/builtin-clear-padding-2.C: Ditto.
* g++.dg/torture/builtin-clear-padding-3.C: Ditto.
* g++.dg/torture/builtin-clear-padding-4.C: Ditto.
* g++.dg/torture/builtin-clear-padding-5.C: Ditto.

12 months agoFortran: Suppress bogus used uninitialized warnings [PR108889].
Paul Thomas [Thu, 18 Jul 2024 07:51:35 +0000 (08:51 +0100)] 
Fortran: Suppress bogus used uninitialized warnings [PR108889].

2024-07-18  Paul Thomas  <pault@gcc.gnu.org>

gcc/fortran
PR fortran/108889
* gfortran.h: Add bit field 'allocated_in_scope' to gfc_symbol.
* trans-array.cc (gfc_array_allocate): Set 'allocated_in_scope'
after allocation if not a component reference.
(gfc_alloc_allocatable_for_assignment): If 'allocated_in_scope'
not set, not a component ref and not allocated, set the array
bounds and offset to give zero length in all dimensions. Then
set allocated_in_scope.

gcc/testsuite/
PR fortran/108889
* gfortran.dg/pr108889.f90: New test.

12 months agogimple-fold: consistent dump of builtin call simplifications
Rubin Gerritsen [Tue, 16 Jul 2024 19:11:24 +0000 (21:11 +0200)] 
gimple-fold: consistent dump of builtin call simplifications

Previously only simplifications of the `__st[xrp]cpy_chk`
were dumped. Now all call replacement simplifications are
dumped.

Examples of statements with corresponding dumpfile entries:

`printf("mystr\n");`:
  optimized: simplified printf to __builtin_puts
`printf("%c", 'a');`:
  optimized: simplified printf to __builtin_putchar
`printf("%s\n", "mystr");`:
  optimized: simplified printf to __builtin_puts

The below test suites passed for this patch
* The x86 bootstrap test.
* Manual testing with some small example code manually
  examining dump logs, outputting the lines mentioned above.

gcc/ChangeLog:

* gimple-fold.cc (dump_transformation): Moved definition.
(replace_call_with_call_and_fold): Calls dump_transformation.
(gimple_fold_builtin_stxcpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.
(gimple_fold_builtin_stxncpy_chk): Removes call to
dump_transformation, now in replace_call_with_call_and_fold.

Signed-off-by: Rubin Gerritsen <rubin.gerritsen@gmail.com>
12 months agotree-optimization/104515 - store motion and clobbers
Richard Biener [Wed, 17 Jul 2024 08:22:47 +0000 (10:22 +0200)] 
tree-optimization/104515 - store motion and clobbers

The following addresses an old regression when end-of-object/storage
clobbers were introduced.  In particular when there's an end-of-object
clobber in a loop but no corresponding begin-of-object we can still
perform store motion of may-aliased refs when we re-issue the
end-of-object/storage on the exits but elide it from the loop.  This
should be the safest way to deal with this considering stack-slot
sharing and it should not cause missed dead store eliminations given
DSE can now follow multiple paths in case there are multiple exits.

Note when the clobber is re-materialized only on one exit but not
on anther we are erroring on the side of removing the clobber on
such path.  This should be OK (removing clobbers is always OK).

Note there's no corresponding code to handle begin-of-object/storage
during the hoisting part of loads that are part of a store motion
optimization, so this only enables stored-only store motion or cases
without such clobber inside the loop.

PR tree-optimization/104515
* tree-ssa-loop-im.cc (execute_sm_exit): Add clobbers_to_prune
parameter and handle re-materializing of clobbers.
(sm_seq_valid_bb): end-of-storage/object clobbers are OK inside
an ordered sequence of stores.
(sm_seq_push_down): Refuse to push down clobbers.
(hoist_memory_references): Prune clobbers from the loop body
we re-materialized on an exit.

* g++.dg/opt/pr104515.C: New testcase.

12 months agoImplement a -ftrapping-math/-fsignaling-nans TODO in match.pd.
Roger Sayle [Thu, 18 Jul 2024 07:27:36 +0000 (08:27 +0100)] 
Implement a -ftrapping-math/-fsignaling-nans TODO in match.pd.

I've been investigating some (float)i == CST optimizations for match.pd,
and noticed there's already a TODO comment in match.pd that's relatively
easy to implement.  When CST is a NaN, we only need to worry about
exceptions with flag_trapping_math, and equality/inequality tests for
sNaN only behave differently to qNaN with -fsignaling-nans.  These
issues are related to PR 57371 and PR 106805 in bugzilla.

2024-07-18  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* match.pd ((FTYPE) N CMP CST): Only worry about exceptions with
flag_trapping_math, and about signaling NaNs with HONOR_SNANS.

gcc/testsuite/ChangeLog
* c-c++-common/pr57371-4.c: Update comment.
* c-c++-common/pr57371-5.c: Add missing testcases from pr57371-4.c
and update for -fno-signaling-nans -fno-trapping-math.

12 months agoFortran: Use char* for deferred length character arrays [PR82904]
Andre Vehreschild [Wed, 10 Jul 2024 12:37:37 +0000 (14:37 +0200)] 
Fortran: Use char* for deferred length character arrays [PR82904]

Randomly during compiling the pass IPA: inline would ICE.  This was
caused by a saved deferred length string.  The length variable was not
set, but the variable was used in the array's declaration.  Now using a
character pointer to prevent this.

PR fortran/82904

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_sym_type): Use type `char*` for saved
deferred length char arrays.
* trans.cc (get_array_span): Get `.span` also for `char*` typed
arrays, i.e. for those that have INTEGER_TYPE instead of
ARRAY_TYPE.

gcc/testsuite/ChangeLog:

* gfortran.dg/deferred_character_38.f90: New test.

12 months agotestsuite: Fix up builtin-clear-padding-3.c for -funsigned-char
Jakub Jelinek [Thu, 18 Jul 2024 07:22:10 +0000 (09:22 +0200)] 
testsuite: Fix up builtin-clear-padding-3.c for -funsigned-char

As reported on gcc-regression, this test FAILs on aarch64, but my
r15-2090 change didn't change anything on the generated assembly,
just added the forgotten dg-do run directive to the test, so the
test has been failing forever, just we didn't know it.

I can actually reproduce it on x86_64 with -funsigned-char too,
s2.b.a has int type and -1 is stored to it, so we should compare
it against -1 rather than (char) -1; the latter is appropriate for
testing char fields into which we've stored -1.

2024-07-18  Jakub Jelinek  <jakub@redhat.com>

* c-c++-common/torture/builtin-clear-padding-3.c (main): Compare
s2.b.a against -1 rather than (char) -1.

12 months agoi386: Fix testcases generating invalid asm
Haochen Jiang [Wed, 17 Jul 2024 08:26:35 +0000 (16:26 +0800)] 
i386: Fix testcases generating invalid asm

For compile test, we should generate valid asm except for special purposes.
Fix the compile test that generates invalid asm.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-egprs-names.c: Use ax for short and
al for char instead of eax.
* gcc.target/i386/avx512bw-kandnq-1.c: Do not run the test
under -m32 since kmovq with register is invalid. Use long
long to use 64 bit register instead of 32 bit register for
kmovq.
* gcc.target/i386/avx512bw-kandq-1.c: Ditto.
* gcc.target/i386/avx512bw-knotq-1.c: Ditto.
* gcc.target/i386/avx512bw-korq-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftlq-1.c: Ditto.
* gcc.target/i386/avx512bw-kshiftrq-1.c: Ditto.
* gcc.target/i386/avx512bw-kxnorq-1.c: Ditto.
* gcc.target/i386/avx512bw-kxorq-1.c: Ditto.

12 months ago[aarch64] Document rewriting of -march=native to -mcpu=native
Kyrylo Tkachov [Tue, 16 Jul 2024 11:29:42 +0000 (16:59 +0530)] 
[aarch64] Document rewriting of -march=native to -mcpu=native

Commit dd9e5f4db2debf1429feab7f785962ccef6e0dbd changed -march=native to
treat it as -mcpu=native if no other mcpu or mtune option was given.
It would make sense to document this, especially if we try to persuade
compilers like LLVM to take the same approach.
This patch documents that behaviour.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:

* doc/invoke.texi (AArch64 Options): Document rewriting of
-march=native to -mcpu=native.

12 months agoOptimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV
liuhongt [Tue, 16 Jul 2024 07:29:01 +0000 (15:29 +0800)] 
Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

gcc/ChangeLog:

PR target/115843
* config/i386/predicates.md (const0_or_m1_operand): New
predicate.
* config/i386/sse.md (*<avx512>_store<mode>_mask_1): New
pre_reload define_insn_and_split.
(V): Add V32BF,V16BF,V8BF.
(V4SF_V8BF): Rename to ..
(V24F_128): .. this.
(*vec_concat<mode>): Adjust with V24F_128.
(*vec_concat<mode>_0): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115843.c: New test.

12 months agoMark expand musttail error messages for translation
Andi Kleen [Fri, 21 Jun 2024 18:19:12 +0000 (11:19 -0700)] 
Mark expand musttail error messages for translation

The musttail error messages are reported to the user, so must be
translated.

gcc/ChangeLog:

PR c/83324
* calls.cc (initialize_argument_information): Mark messages
for translation.
(can_implement_as_sibling_call_p): Dito.
(expand_call): Dito.

12 months agoGive better error messages for musttail
Andi Kleen [Tue, 21 May 2024 14:01:57 +0000 (07:01 -0700)] 
Give better error messages for musttail

When musttail is set, make tree-tailcall give error messages
when it cannot handle a call. This avoids vague "other reasons"
error messages later at expand time when it sees a musttail
function not marked tail call.

In various cases this requires delaying the error until
the call is discovered.

Also print more information on the failure to the dump file.

gcc/ChangeLog:

PR c/83324
* tree-tailcall.cc (maybe_error_musttail): New function.
(suitable_for_tail_opt_p): Report error reason.
(suitable_for_tail_call_opt_p): Report error reason.
(find_tail_calls): Accept basic blocks with abnormal edges.
Delay reporting of errors until the call is discovered.
Move top level suitability checks to here.
(tree_optimize_tail_calls_1): Remove top level checks.

12 months agoEnable musttail tail conversion even when not optimizing
Andi Kleen [Thu, 16 May 2024 02:57:22 +0000 (19:57 -0700)] 
Enable musttail tail conversion even when not optimizing

Enable the tailcall optimization for non optimizing builds,
but in this case only checks calls that have the musttail attribute set.
This makes musttail work without optimization.

This is done with a new late musttail pass that is only active when
not optimizing. The new pass relies on tree-cfg to discover musttails.
This avoids a ~0.8% compiler run time penalty at -O0.

gcc/ChangeLog:

PR c/83324
* function.h (struct function): Add has_musttail.
* lto-streamer-in.cc (input_struct_function_base): Stream
has_musttail.
* lto-streamer-out.cc (output_struct_function_base): Dito.
* passes.def (pass_musttail): Add.
* tree-cfg.cc (notice_special_calls): Record has_musttail.
(clear_special_calls): Clear has_musttail.
* tree-pass.h (make_pass_musttail): Add.
* tree-tailcall.cc (find_tail_calls): Handle only_musttail
argument.
(tree_optimize_tail_calls_1): Pass on only_musttail.
(execute_tail_calls): Pass only_musttail as false.
(class pass_musttail): Add.
(make_pass_musttail): Add.

12 months agoFix pro_and_epilogue for sibcalls at -O0 (PR115255)
Andi Kleen [Sun, 2 Jun 2024 05:04:41 +0000 (22:04 -0700)] 
Fix pro_and_epilogue for sibcalls at -O0 (PR115255)

Some of the cfg fixups in pro_and_epilogue for sibcalls were dependent on "optimize".
Make them check cfun->tail_call_marked instead to handle the -O0 musttail
case. This fixes the musttail test cases on arm targets.

gcc/ChangeLog:

PR target/115255
* function.cc (thread_prologue_and_epilogue_insns): Check
cfun->tail_call_marked for sibcalls too.
(rest_of_handle_thread_prologue_and_epilogue): Dito.

12 months agoImprove must tail in RTL backend
Andi Kleen [Wed, 24 Jan 2024 07:42:08 +0000 (23:42 -0800)] 
Improve must tail in RTL backend

- Give error messages for all causes of non sibling call generation
- When giving error messages clear the musttail flag to avoid ICEs
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message.

gcc/ChangeLog:

PR c/83324
* calls.cc (maybe_complain_about_tail_call): Clear must tail
flag on error.
(expand_call): Give error messages for all musttail failures.

12 months agoc++/modules: Conditionally start timer during lazy load [PR115165]
Nathaniel Shead [Sun, 7 Jul 2024 13:19:52 +0000 (23:19 +1000)] 
c++/modules: Conditionally start timer during lazy load [PR115165]

While lazy loading, instantiation of pendings can sometimes recursively
perform name lookup and begin further lazy loading.  When using the
'-ftime-report' functionality this causes ICEs as we could start an
already-running timer for the importing.

This patch fixes the issue by using the 'timevar_cond*' API instead to
support such recursive calls.

PR c++/115165

gcc/cp/ChangeLog:

* module.cc (lazy_load_binding): Use 'timevar_cond*' APIs.
(lazy_load_pendings): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/timevar-1_a.H: New test.
* g++.dg/modules/timevar-1_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
12 months agoc++: prev declared hidden tmpl friend inst [PR112288]
Patrick Palka [Thu, 18 Jul 2024 01:02:52 +0000 (21:02 -0400)] 
c++: prev declared hidden tmpl friend inst [PR112288]

When partially instantiating a previously declared hidden template
friend definition (at class template scope) such as slot_allocated in
the first testcase below, tsubst_friend_function needs to go through
all existing specializations thereof and make them point to the new
definition.

But when the previous declaration was also at class template scope,
old_decl is not the most general template, instead it's the partial
instantiation, and since instantiations are relative to the most general
template, old_decl's DECL_TEMPLATE_INSTANTIATIONS is empty.  So we
to consistently use the most general template here.  And when adjusting
DECL_TI_ARGS to match, only the innermost template arguments should be
preserved; the outer ones should correspond to the new definition.

Otherwise we fail a checking-only sanity check in instantiate_decl in
the first testcase, and in the second/third we end up emitting multiple
definitions of the template friend instantiation, resulting in a link
failure.

PR c++/112288

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): When adjusting existing
specializations after defining a previously declared template
friend, consider the most general template and correct
DECL_TI_ARGS adjustment.

gcc/testsuite/ChangeLog:

* g++.dg/template/friend80.C: New test.
* g++.dg/template/friend81.C: New test.
* g++.dg/template/friend81a.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
12 months agolibbacktrace: add cast to avoid warning
Ian Lance Taylor [Thu, 18 Jul 2024 00:58:56 +0000 (17:58 -0700)] 
libbacktrace: add cast to avoid warning

* print.c (print_syminfo_callback): Add cast to avoid warning.

12 months agoc++: missing -Wunused-value for !<expr> [PR114104]
Patrick Palka [Thu, 18 Jul 2024 00:57:54 +0000 (20:57 -0400)] 
c++: missing -Wunused-value for !<expr> [PR114104]

Here we're neglecting to issue a -Wunused-value warning for suitable !
operator expressions, and in turn for != operator expressions that are
rewritten as !(x == y), only because we don't call warn_if_unused_value
on TRUTH_NOT_EXPR since its class is tcc_expression.  This patch makes
us also consider warning for TRUTH_NOT_EXPR and also for ADDR_EXPR.

PR c++/114104

gcc/cp/ChangeLog:

* cvt.cc (convert_to_void): Call warn_if_unused_value for
TRUTH_NOT_EXPR and ADDR_EXPR as well.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wunused-20.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
12 months agoc++: diagnose failed qualified lookup into current inst
Patrick Palka [Thu, 18 Jul 2024 00:54:14 +0000 (20:54 -0400)] 
c++: diagnose failed qualified lookup into current inst

When the scope of a qualified name is the current instantiation, and
qualified lookup finds nothing at template definition time, then we
know it'll find nothing at instantiation time (unless the current
instantiation has dependent bases).  So such qualified name lookup
failure can be diagnosed ahead of time as per [temp.res.general]/6.

This patch implements that, for qualified names of the form (where
the current instantiation is A<T>):

  this->non_existent
  a.non_existent
  A::non_existent
  typename A::non_existent

It turns out we already optimistically attempt qualified lookup of
seemingly every qualified name, even when it's dependently scoped, and
then suppress issuing a lookup failure diagnostic after the fact.
So implementing this is mostly a matter of restricting the diagnostic
suppression to "dependentish" scopes (i.e. dependent scopes or the
current instantiation with dependent bases), rather than suppressing
for any dependently-typed scope as we currently do.

The cp_parser_conversion_function_id change is needed to avoid regressing
lookup/using8.C:

  using A<T>::operator typename A<T>::Nested*;

When looking up A<T>::Nested we consider it not dependently scoped since
we entered A<T> from cp_parser_conversion_function_id earlier.   But this
A<T> is the implicit instantiation A<T> not the primary template type A<T>,
and so the lookup fails which we now diagnose.  This patch works around
this by not entering the template scope of a qualified conversion
function-id in this case, i.e. if we're in an expression vs declaration
context, by seeing if the type already went through finish_template_type
with entering_scope=true.

gcc/cp/ChangeLog:

* decl.cc (make_typename_type): Restrict name lookup failure
punting to dependentish_scope_p instead of dependent_type_p.
* error.cc (qualified_name_lookup_error): Improve diagnostic
when the scope is the current instantiation.
* parser.cc (cp_parser_diagnose_invalid_type_name): Likewise.
(cp_parser_conversion_function_id): Don't call push_scope on
a template scope unless we're in a declaration context.
(cp_parser_lookup_name): Restrict name lookup failure
punting to dependentish_scope_p instead of depedent_type_p.
* semantics.cc (finish_id_expression_1): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.

libstdc++-v3/ChangeLog:

* include/experimental/socket
(basic_socket_iostream::basic_socket_iostream): Fix typo.
* include/tr2/dynamic_bitset
(__dynamic_bitset_base::_M_is_proper_subset_of): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alignas18.C: Expect name lookup error for U::X.
* g++.dg/cpp0x/forw_enum13.C: Expect name lookup error for
D3::A and D4<T>::A.
* g++.dg/parse/access13.C: Declare A::E::V to avoid name lookup
failure and preserve intent of the test.
* g++.dg/parse/enum11.C: Expect extra errors, matching the
non-template case.
* g++.dg/template/crash123.C: Avoid name lookup failure to
preserve intent of the test.
* g++.dg/template/crash124.C: Likewise.
* g++.dg/template/crash7.C: Adjust expected diagnostics.
* g++.dg/template/dtor6.C: Declare A::~A() to avoid name lookup
failure and preserve intent of the test.
* g++.dg/template/error22.C: Adjust expected diagnostics.
* g++.dg/template/static30.C: Avoid name lookup failure to
preserve intent of the test.
* g++.old-deja/g++.other/decl5.C: Adjust expected diagnostics.
* g++.dg/template/non-dependent34.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
12 months agolibbacktrace: better backtrace_print when no debug info
Ian Lance Taylor [Thu, 18 Jul 2024 00:36:25 +0000 (17:36 -0700)] 
libbacktrace: better backtrace_print when no debug info

Fixes https://github.com/ianlancetaylor/libbacktrace/issues/59

* print.c (print_syminfo_callback): New static function.
(print_callback): Call backtrace_syminfo if there is no function
or file name.

12 months agoDaily bump.
GCC Administrator [Thu, 18 Jul 2024 00:18:58 +0000 (00:18 +0000)] 
Daily bump.

12 months agolibbacktrace: add notes about dl_iterate_phdr to README
Ian Lance Taylor [Thu, 18 Jul 2024 00:02:56 +0000 (17:02 -0700)] 
libbacktrace: add notes about dl_iterate_phdr to README

* README: Add notes about dl_iterate_phdr.

12 months agotestsuite: Fix up pr111150* tests on i686-linux [PR111150]
Jakub Jelinek [Wed, 17 Jul 2024 21:47:17 +0000 (23:47 +0200)] 
testsuite: Fix up pr111150* tests on i686-linux [PR111150]

The tests FAIL on i686-linux due to unexpected -Wpsabi diagnostics.
Fixed as usually by adding -Wno-psabi to dg-options.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/111150
* gcc.dg/tree-ssa/pr111150.c: Add -Wno-psabi to dg-options.
* g++.dg/tree-ssa/pr111150.C: Likewise.

12 months agoUse foreach, not lmap, for tcl <= 8.5 compat
Jørgen Kvalsvik [Sun, 14 Jul 2024 19:39:44 +0000 (21:39 +0200)] 
Use foreach, not lmap, for tcl <= 8.5 compat

lmap was introduced in tcl 8.6, and while it was released in 2012, lmap
does not really make too much of a difference to warrant the friction on
consverative (and relevant) systems.

gcc/testsuite/ChangeLog:

* lib/gcov.exp: Use foreach, not lmap, for tcl <= 8.5 compat.

12 months agortl-ssa: Fix move range canonicalisation [PR115929]
Richard Sandiford [Wed, 17 Jul 2024 18:38:12 +0000 (19:38 +0100)] 
rtl-ssa: Fix move range canonicalisation [PR115929]

In this PR, canonicalize_move_range walked off the end of a list
and triggered a null dereference.  There are multiple ways of fixing
that, but I think the approach taken in the patch should be
relatively efficient.

gcc/
PR rtl-optimization/115929
* rtl-ssa/movement.h (canonicalize_move_range): Check for null prev
and next insns and create an invalid move range for them.

gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-2.c: New test.

12 months agortl-ssa: Fix split_clobber_group [PR115928]
Richard Sandiford [Wed, 17 Jul 2024 18:38:11 +0000 (19:38 +0100)] 
rtl-ssa: Fix split_clobber_group [PR115928]

One of the goals of the rtl-ssa representation was to allow a
group of consecutive clobbers to be skipped in constant time,
with amortised sublinear insertion and deletion.  This involves
putting consecutive clobbers in groups.  Splitting or joining
groups would be linear if we had to update every clobber on
each update, so the operation to query a clobber's group is
lazy and (again) amortised sublinear.

This means that, when splitting a group into two, we cannot
reuse the old group for one side.  We have to invalidate it,
so that the lazy clobber_info::group query can tell that something
has changed.  The ICE in the PR came from failing to do that.

gcc/
PR rtl-optimization/115928
* rtl-ssa/accesses.h (clobber_group): Add a new constructor that
takes the first, last and root clobbers.
* rtl-ssa/internals.inl (clobber_group::clobber_group): Define it.
* rtl-ssa/accesses.cc (function_info::split_clobber_group): Use it.
Allocate a new group for both sides and invalidate the previous group.
(function_info::add_def): After calling split_clobber_group,
remove the old group from the splay tree.

gcc/testsuite/
PR rtl-optimization/115928
* gcc.dg/torture/pr115928.c: New test.

12 months agogenattrtab: Drop enum tags, consolidate type names
Richard Sandiford [Wed, 17 Jul 2024 18:34:46 +0000 (19:34 +0100)] 
genattrtab: Drop enum tags, consolidate type names

genattrtab printed an "enum" tag before references to attribute
enums, but that's redundant in C++.  Removing it means that each
attribute type becomes a single token and can be easily stored
in the attr_desc structure.

gcc/
* genattrtab.cc (attr_desc::cxx_type): New field.
(write_attr_get, write_attr_value): Use it.
(gen_attr, find_attr, make_internal_attr): Initialize it,
dropping enum tags.

12 months agoc++: wrong error initializing empty class [PR115900]
Marek Polacek [Wed, 17 Jul 2024 15:19:32 +0000 (11:19 -0400)] 
c++: wrong error initializing empty class [PR115900]

In r14-409, we started handling empty bases first in cxx_fold_indirect_ref_1
so that we don't need to recurse and waste time.

This caused a bogus "modifying a const object" error.  I'm appending my
analysis from the PR, but basically, cxx_fold_indirect_ref now returns
a different object than before, and we mark the wrong thing as const,
but since we're initializing an empty object, we should avoid setting
the object constness.

~~
Pre-r14-409: we're evaluating the call to C::C(), which is in the body of
B::B(), which is the body of D::D(&d):

  C::C ((struct C *) this, NON_LVALUE_EXPR <0>)

It's a ctor so we get here:

 3118   /* Remember the object we are constructing or destructing.  */
 3119   tree new_obj = NULL_TREE;
 3120   if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
 3121     {
 3122       /* In a cdtor, it should be the first `this' argument.
 3123          At this point it has already been evaluated in the call
 3124          to cxx_bind_parameters_in_call.  */
 3125       new_obj = TREE_VEC_ELT (new_call.bindings, 0);

new_obj=(struct C *) &d.D.2656

 3126       new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), new_obj);

new_obj=d.D.2656.D.2597

We proceed to evaluate the call, then we get here:

 3317           /* At this point, the object's constructor will have run, so
 3318              the object is no longer under construction, and its possible
 3319              'const' semantics now apply.  Make a note of this fact by
 3320              marking the CONSTRUCTOR TREE_READONLY.  */
 3321           if (new_obj && DECL_CONSTRUCTOR_P (fun))
 3322             cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/true,
 3323                                       non_constant_p, overflow_p);

new_obj is still d.D.2656.D.2597, its type is "C", cxx_set_object_constness
doesn't set anything as const.  This is fine.

After r14-409: on line 3125, new_obj is (struct C *) &d.D.2656 as before,
but we go to cxx_fold_indirect_ref_1:

 5739       if (is_empty_class (type)
 5740           && CLASS_TYPE_P (optype)
 5741           && lookup_base (optype, type, ba_any, NULL, tf_none, off))
 5742         {
 5743           if (empty_base)
 5744             *empty_base = true;
 5745           return op;

type is C, which is an empty class; optype is "const D", and C is a base of D.
So we return the VAR_DECL 'd'.  Then we get to cxx_set_object_constness with
object=d, which is const, so we mark the constructor READONLY.

Then we're evaluating A::A() which has

  ((A*)this)->data = 0;

we evaluate the LHS to d.D.2656.a, for which the initializer is
{.D.2656={.a={.data=}}} which is TREE_READONLY and 'd' is const, so we think
we're modifying a const object and fail the constexpr evaluation.

PR c++/115900

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Set new_obj to NULL_TREE
if cxx_fold_indirect_ref set empty_base to true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-init23.C: New test.

12 months agoRISC-V: Fix testcase missing arch attribute
Edwin Lu [Wed, 17 Jul 2024 00:43:45 +0000 (17:43 -0700)] 
RISC-V: Fix testcase missing arch attribute

The C + F extention implies the zcf extension on rv32. Add missing zcf
extension for the rv32 target.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/target-attr-16.c: Update expected assembly

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
12 months agoMATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]
Eikansh Gupta [Wed, 22 May 2024 17:58:48 +0000 (23:28 +0530)] 
MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]

This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
`vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.

The patch adds match pattern for a, b:
(a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE
(a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
(a ? x : y) != (b ? y : x) --> (a^b) ? TRUE  : FALSE
(a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE

PR tree-optimization/111150

gcc/ChangeLog:

* match.pd (`(a ? x : y) eq/ne (b ? x : y)`): New pattern.
(`(a ? x : y) eq/ne (b ? y : x)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr111150.c: New test.
* gcc.dg/tree-ssa/pr111150-1.c: New test.
* g++.dg/tree-ssa/pr111150.C: New test.

Signed-off-by: Eikansh Gupta <quic_eikagupt@quicinc.com>
12 months agoAdd debug counter for ext_dce
Andrew Pinski [Tue, 16 Jul 2024 16:53:20 +0000 (09:53 -0700)] 
Add debug counter for ext_dce

Like r15-1610-gb6215065a5b143 (which adds one for late_combine),
adding one for ext_dce is useful to debug some issues with this pass.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* dbgcnt.def (ext_dce): New debug counter.
* ext-dce.cc (ext_dce_try_optimize_insn): Reject the insn
if the debug counter says so.
(ext_dce): Rename to ...
(ext_dce_execute): This.
(pass_ext_dce::execute): Update for the name of ext_dce.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
12 months agoalpha: Fix duplicate !tlsgd!62 assemble error [PR115526]
Uros Bizjak [Wed, 17 Jul 2024 16:11:26 +0000 (18:11 +0200)] 
alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_<tls>): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

12 months agoRegenerate c.opt.urls
Mark Wielaard [Wed, 17 Jul 2024 15:58:14 +0000 (17:58 +0200)] 
Regenerate c.opt.urls

The addition of -Wunterminated-string-initialization should have
regenerated the c.opt.urls file.

Fixes: 44c9403ed183 ("c, objc: Add -Wunterminated-string-initialization")
gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

12 months agoAVR: target/90616 - Improve adding constants that are 0 mod 256.
Georg-Johann Lay [Thu, 4 Jul 2024 10:08:34 +0000 (12:08 +0200)] 
AVR: target/90616 - Improve adding constants that are 0 mod 256.

This patch introduces a new insn that works as an insn combine
pattern for

   (plus:HI (zero_extend:HI (reg:QI))
            (const_0mod256_operannd:HI))

which requires at most 2 instructions.  When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).

gcc/
PR target/90616
* config/avr/predicates.md (const_0mod256_operand): New predicate.
* config/avr/constraints.md (Cp8): New constraint.
* config/avr/avr.md (*aligned_add_symbol): New insn.
* config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
When op2 is a multiple of 256, there is no need to add / subtract
the lo8 part.
(avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
new insn *aligned_add_symbol as it applies.

12 months agobitint: Use gsi_insert_on_edge rather than gsi_insert_on_edge_immediate [PR115887]
Jakub Jelinek [Wed, 17 Jul 2024 15:32:21 +0000 (17:32 +0200)] 
bitint: Use gsi_insert_on_edge rather than gsi_insert_on_edge_immediate [PR115887]

The following testcase ICEs on x86_64-linux, because we try to
gsi_insert_on_edge_immediate a statement on an edge which already has
statements queued with gsi_insert_on_edge, and the deferral has been
intentional so that we don't need to deal with cfg changes in between.

The following patch uses the delayed insertion as well.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/115887
* gimple-lower-bitint.cc (gimple_lower_bitint): Use gsi_insert_on_edge
instead of gsi_insert_on_edge_immediate and set edge_insertions to
true.

* gcc.dg/bitint-108.c: New test.

12 months agovarasm: Shorten assembly of strings with larger zero regions
Jakub Jelinek [Wed, 17 Jul 2024 15:30:24 +0000 (17:30 +0200)] 
varasm: Shorten assembly of strings with larger zero regions

When not using .base64 directive, we emit for long sequences of zeros
        .string "foobarbaz"
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
The following patch changes that to
        .string "foobarbaz"
        .zero   12
It keeps emitting .string "" if there is just one zero or two zeros where
the first one is preceded by non-zeros, so we can have
        .string "foobarbaz"
        .string ""
or
        .base64 "VG8gYmUgb3Igbm90IHRvIGJlLCB0aGF0IGlzIHRoZSBxdWVzdGlvbg=="
        .string ""
but not 2 .string "" in a row.

On a testcase I have with around 310440 0-255 unsigned char character
constants mostly derived from cc1plus start but with too long sequences of
0s which broke transformation to STRING_CST adjusted to have at most 126
consecutive 0s, I see:
1504498 bytes long assembly without this patch on i686-linux (without
.base64 support in binutils)
1155071 bytes long assembly with this patch on i686-linux (without .base64
support in binutils)
431390 bytes long assembly without this patch on x86_64-linux (with
.base64 support in binutils)
427593 bytes long assembly with this patch on x86_64-linux (with .base64
support in binutils)
All 4 assemble to identical *.o file when using x86_64-linux .base64
supporting gas, and the former 2 when using older x86_64-linux gas assemble
to identical content as well.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

* varasm.cc (default_elf_asm_output_ascii): Use ASM_OUTPUT_SKIP instead
of 2 or more default_elf_asm_output_limited_string (f, "") calls and
adjust base64 heuristics correspondingly.

12 months agomiddle-end: fix 0 offset creation and folding [PR115936]
Tamar Christina [Wed, 17 Jul 2024 15:22:14 +0000 (16:22 +0100)] 
middle-end: fix 0 offset creation and folding [PR115936]

As shown in PR115936 SCEV and IVOPTS create an invalidate IV when the IV is
a pointer type:

ivtmp.39_65 = ivtmp.39_59 + 0B;

where the IVs are DI mode and the offset is a pointer.
This comes from this weird candidate:

Candidate 8:
  Var befor: ivtmp.39_59
  Var after: ivtmp.39_65
  Incr POS: before exit test
  IV struct:
    Type:       sizetype
    Base:       0
    Step:       0B
    Biv:        N
    Overflowness wrto loop niter:       No-overflow

This IV was always created just ended up not being used.

This is created by SCEV.

simple_iv_with_niters in the case where no CHREC is found creates an IV with
base == ev, offset == 0;

however in this case EV is a POINTER_PLUS_EXPR and so the type is a pointer.
it ends up creating an unusable expression.

gcc/ChangeLog:

PR tree-optimization/115936
* tree-scalar-evolution.cc (simple_iv_with_niters): Use sizetype for
pointers.

12 months agoc++: constrained partial spec type context [PR111890]
Patrick Palka [Wed, 17 Jul 2024 15:08:35 +0000 (11:08 -0400)] 
c++: constrained partial spec type context [PR111890]

maybe_new_partial_specialization wasn't propagating TYPE_CONTEXT when
creating a new class type corresponding to a constrained partial spec,
which do_friend relies on via template_class_depth to distinguish a
template friend from a non-template friend, and so in the below testcase
we were incorrectly instantiating the non-template operator+ as if it
were a template leading to an ICE.

PR c++/111890

gcc/cp/ChangeLog:

* pt.cc (maybe_new_partial_specialization): Propagate TYPE_CONTEXT
to the newly created partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-partial-spec15.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
12 months agovect: Optimize order of lane-reducing operations in loop def-use cycles
Feng Xue [Wed, 29 May 2024 09:28:14 +0000 (17:28 +0800)] 
vect: Optimize order of lane-reducing operations in loop def-use cycles

When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example:

   int sum = 1;
   for (i)
     {
       sum += d0[i] * d1[i];      // dot-prod <vector(16) char>
       sum += w[i];               // widen-sum <vector(16) char>
       sum += abs(s0[i] - s1[i]); // sad <vector(8) short>
       sum += n[i];               // normal <vector(4) int>
     }

Original transformation result:

   for (i / 16)
     {
       sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
       sum_v1 = sum_v1;  // copy
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
       sum_v1 = sum_v1;  // copy
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
       sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       ...
     }

For a higher instruction parallelism in final vectorized loop, an optimal
means is to make those effective vector lane-reducing ops be distributed
evenly among all def-use cycles. Transformed as the below, DOT_PROD,
WIDEN_SUM and SADs are generated into disparate cycles, instruction
dependency among them could be eliminated.

   for (i / 16)
     {
       sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
       sum_v1 = sum_v1;  // copy
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       sum_v0 = sum_v0;  // copy
       sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       sum_v0 = sum_v0;  // copy
       sum_v1 = sum_v1;  // copy
       sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2);
       sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3);

       ...
     }

2024-03-22 Feng Xue <fxue@os.amperecomputing.com>

gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
* tree-vect-loop.cc (vect_transform_reduction): Generate lane-reducing
statements in an optimized order.

12 months agovect: Support multiple lane-reducing operations for loop reduction [PR114440]
Feng Xue [Wed, 29 May 2024 09:22:36 +0000 (17:22 +0800)] 
vect: Support multiple lane-reducing operations for loop reduction [PR114440]

For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.

This patches removes some constraints in reduction analysis to allow multiple
arbitrary lane-reducing operations with mixed input vectypes in a loop
reduction chain. For example:

   int sum = 1;
   for (i)
     {
       sum += d0[i] * d1[i];      // dot-prod <vector(16) char>
       sum += w[i];               // widen-sum <vector(16) char>
       sum += abs(s0[i] - s1[i]); // sad <vector(8) short>
     }

The vector size is 128-bit vectorization factor is 16. Reduction statements
would be transformed as:

   vector<4> int sum_v0 = { 0, 0, 0, 1 };
   vector<4> int sum_v1 = { 0, 0, 0, 0 };
   vector<4> int sum_v2 = { 0, 0, 0, 0 };
   vector<4> int sum_v3 = { 0, 0, 0, 0 };

   for (i / 16)
     {
       sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
       sum_v1 = sum_v1;  // copy
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
       sum_v1 = sum_v1;  // copy
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy

       sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
       sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
       sum_v2 = sum_v2;  // copy
       sum_v3 = sum_v3;  // copy
     }

    sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0 + sum_v1

2024-03-22 Feng Xue <fxue@os.amperecomputing.com>

gcc/
PR tree-optimization/114440
* tree-vectorizer.h (vectorizable_lane_reducing): New function
declaration.
* tree-vect-stmts.cc (vect_analyze_stmt): Call new function
vectorizable_lane_reducing to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_model_reduction_cost): Remove cost computation
code related to emulated_mixed_dot_prod.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing.
(vect_transform_reduction): Adjust comments with updated example.

gcc/testsuite/
PR tree-optimization/114440
* gcc.dg/vect/vect-reduc-chain-1.c
* gcc.dg/vect/vect-reduc-chain-2.c
* gcc.dg/vect/vect-reduc-chain-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-4.c
* gcc.dg/vect/vect-reduc-dot-slp-1.c

12 months agovect: Refit lane-reducing to be normal operation
Feng Xue [Tue, 2 Jul 2024 09:12:00 +0000 (17:12 +0800)] 
vect: Refit lane-reducing to be normal operation

Vector stmts number of an operation is calculated based on output vectype.
This is over-estimated for lane-reducing operation, which would cause vector
def/use mismatched when we want to support loop reduction mixed with lane-
reducing and normal operations. One solution is to refit lane-reducing
to make it behave like a normal one, by adding new pass-through copies to
fix possible def/use gap. And resultant superfluous statements could be
optimized away after vectorization.  For example:

  int sum = 1;
  for (i)
    {
      sum += d0[i] * d1[i];      // dot-prod <vector(16) char>
    }

  The vector size is 128-bit,vectorization factor is 16.  Reduction
  statements would be transformed as:

  vector<4> int sum_v0 = { 0, 0, 0, 1 };
  vector<4> int sum_v1 = { 0, 0, 0, 0 };
  vector<4> int sum_v2 = { 0, 0, 0, 0 };
  vector<4> int sum_v3 = { 0, 0, 0, 0 };

  for (i / 16)
    {
      sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
      sum_v1 = sum_v1;  // copy
      sum_v2 = sum_v2;  // copy
      sum_v3 = sum_v3;  // copy
    }

  sum_v = sum_v0 + sum_v1 + sum_v2 + sum_v3;   // = sum_v0

2024-07-02 Feng Xue <fxue@os.amperecomputing.com>

gcc/
* tree-vect-loop.cc (vect_reduction_update_partial_vector_usage):
Calculate effective vector stmts number with generic
vect_get_num_copies.
(vect_transform_reduction): Insert copies for lane-reducing so as to
fix over-estimated vector stmts number.
(vect_transform_cycle_phi): Calculate vector PHI number only based on
output vectype.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Remove
adjustment on vector stmts number specific to slp reduction.

12 months agovect: Add a unified vect_get_num_copies for slp and non-slp
Feng Xue [Fri, 12 Jul 2024 08:38:28 +0000 (16:38 +0800)] 
vect: Add a unified vect_get_num_copies for slp and non-slp

Extend original vect_get_num_copies (pure loop-based) to calculate number of
vector stmts for slp node regarding a generic vect region.

2024-07-12 Feng Xue <fxue@os.amperecomputing.com>

gcc/
* tree-vectorizer.h (vect_get_num_copies): New overload function.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Calculate
number of vector stmts for slp node with vect_get_num_copies.
(vect_slp_analyze_node_operations): Calculate number of vector elements
for constant/external slp node with vect_get_num_copies.

12 months agotree-optimization/115959 - ICE with SLP condition reduction
Richard Biener [Wed, 17 Jul 2024 09:42:13 +0000 (11:42 +0200)] 
tree-optimization/115959 - ICE with SLP condition reduction

The following fixes how during reduction epilogue generation we
gather conditional compares for condition reductions, thereby
following the reduction chain via STMT_VINFO_REDUC_IDX.  The issue
is that SLP nodes for COND_EXPRs can have either three or four
children dependent on whether we have legacy GENERIC expressions
in the transitional pattern GIMPLE for the COND_EXPR condition.

PR tree-optimization/115959
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Get at the REDUC_IDX child in a safer way for COND_EXPR
nodes.

* gcc.dg/vect/pr115959.c: New testcase.

12 months agotestsuite: Add dg-do run to another test
Jakub Jelinek [Wed, 17 Jul 2024 09:40:58 +0000 (11:40 +0200)] 
testsuite: Add dg-do run to another test

This is another test which clearly has been written with the assumption that
it will be executed, but it isn't.
It works fine when it is executed on both x86_64-linux and i686-linux.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

* c-c++-common/torture/builtin-convertvector-1.c: Add dg-do run
directive.

12 months agovarasm: Fix bootstrap after the .base64 changes [PR115958]
Jakub Jelinek [Wed, 17 Jul 2024 09:40:03 +0000 (11:40 +0200)] 
varasm: Fix bootstrap after the .base64 changes [PR115958]

Apparently there is a -Wsign-compare warning if ptrdiff_t has precision of
int, then (t - s + 1 + 2) / 3 * 4 has int type while cnt unsigned int.
This doesn't warn if ptrdiff_t has larger precision, say on x86_64
it is 64-bit and so (t - s + 1 + 2) / 3 * 4 has long type and cnt unsigned
int.  And it doesn't warn when using older binutils (in my tests I've
used new binutils on x86_64 and old binutils on i686).
Anyway, earlier condition guarantees that t - s is at most 256-ish and
t >= s by construction, so we can just cast it to (unsigned) to avoid
the warning.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

PR other/115958
* varasm.cc (default_elf_asm_output_ascii): Cast t - s to unsigned
to avoid -Wsign-compare warnings.

12 months agogimple-fold: Fix up __builtin_clear_padding lowering [PR115527]
Jakub Jelinek [Wed, 17 Jul 2024 09:38:33 +0000 (11:38 +0200)] 
gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size.  That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some.  If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
      if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
          > (unsigned HOST_WIDE_INT) buf->sz)
        {
          gcc_assert (wordsize > 1);
          wordsize /= 2;
          i -= wordsize;
          continue;
        }
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end.  If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.

2024-07-17  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/115527
* gimple-fold.cc (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.

* c-c++-common/torture/builtin-clear-padding-1.c: Add dg-do run
directive.
* c-c++-common/torture/builtin-clear-padding-2.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-3.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-5.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-6.c: New test.

12 months agors6000: Remove redundant guard for float128 mode pattern
Haochen Gui [Wed, 17 Jul 2024 06:47:36 +0000 (14:47 +0800)] 
rs6000: Remove redundant guard for float128 mode pattern

gcc/
* config/rs6000/rs6000.md (mov<mode>cc, *mov<mode>cc_p10,
*mov<mode>cc_invert_p10, *fpmask<mode>, *xxsel<mode>,
@ieee_128bit_vsx_abs<mode>2, *ieee_128bit_vsx_nabs<mode>2,
add<mode>3, sub<mode>3, mul<mode>3, div<mode>3, sqrt<mode>2,
copysign<mode>3, copysign<mode>3_hard, copysign<mode>3_soft,
@neg<mode>2_hw, @abs<mode>2_hw, *nabs<mode>2_hw, fma<mode>4_hw,
*fms<mode>4_hw, *nfma<mode>4_hw, *nfms<mode>4_hw,
extend<SFDF:mode><IEEE128:mode>2_hw, trunc<mode>df2_hw,
trunc<mode>sf2_hw, fix<uns>_<IEEE128:mode><SDI:mode>2_hw,
fix<uns>_trunc<IEEE128:mode><QHI:mode>2,
*fix<uns>_trunc<IEEE128:mode><QHSI:mode>2_mem,
float_<mode>di2_hw, float_<mode>si2_hw,
float<QHI:mode><IEEE128:mode>2, floatuns_<mode>di2_hw,
floatuns_<mode>si2_hw, floatuns<QHI:mode><IEEE128:mode>2,
floor<mode>2, ceil<mode>2, btrunc<mode>2, round<mode>2,
add<mode>3_odd, sub<mode>3_odd, mul<mode>3_odd, div<mode>3_odd,
sqrt<mode>2_odd, fma<mode>4_odd, *fms<mode>4_odd, *nfma<mode>4_odd,
*nfms<mode>4_odd, trunc<mode>df2_odd, *cmp<mode>_hw for IEEE128):
Remove guard FLOAT128_IEEE_P.
(@extenddf<mode>2_fprs, @extenddf<mode>2_vsx,
trunc<mode>df2_internal1, trunc<mode>df2_internal2,
fix_trunc_helper<mode>, neg<mode>2, *cmp<mode>_internal1,
*cmp<IBM128:mode>_internal2 for IBM128): Remove guard FLOAT128_IBM_P.

12 months agors6000: Change optab for ibm128 and ieee128 conversion
Kewen Lin [Wed, 17 Jul 2024 05:19:30 +0000 (00:19 -0500)] 
rs6000: Change optab for ibm128 and ieee128 conversion

Currently for 128 bit floating-point ibm128 and ieee128
formats conversion, the corresponding libcalls are:
  ibm128 -> ieee128 "__trunctfkf2"
  ieee128 -> ibm128 "__extendkftf2"
, and generic code handling (like convert_mode_scalar) also
adopts sext_optab for ieee128 -> ibm128 while trunc_optab
for ibm128 -> ieee128.  But in rs6000 port as function
rs6000_expand_float128_convert and init_float128_ieee show,
we adopt sext_optab for ibm128 -> ieee128 with "__trunctfkf2"
while trunc_optab for ieee128 -> ibm128 with "__extendkftf2".

To make them consistent and avoid some surprises, this patch
is to adjust rs6000 internal handlings by adopting trunc_optab
for ibm128 -> ieee128 with "__trunctfkf2" while sext_optab for
ieee128 -> ibm128 with "__extendkftf2".

gcc/ChangeLog:

* config/rs6000/rs6000.cc (init_float128_ieee): Use trunc_optab rather
than sext_optab for converting FLOAT128_IBM_P mode to FLOAT128_IEEE_P
mode, and use sext_optab rather than trunc_optab for converting
FLOAT128_IEEE_P mode to FLOAT128_IBM_P mode.
(rs6000_expand_float128_convert): Likewise.

12 months agotree: Remove KFmode workaround [PR112993]
Kewen Lin [Wed, 17 Jul 2024 05:19:00 +0000 (00:19 -0500)] 
tree: Remove KFmode workaround [PR112993]

The fix for PR112993 makes KFmode have 128 bit mode precision,
we don't need this workaround to fix up the type precision any
more, and just go with mode precision.  So this patch is to
remove KFmode workaround.

PR target/112993

gcc/ChangeLog:

* tree.cc (build_common_tree_nodes): Drop the workaround for rs6000
KFmode precision adjustment.