Andrew Pinski [Mon, 11 Aug 2025 20:47:30 +0000 (20:47 +0000)]
forwprop: Fix non-call exceptions some more with copy prop for aggregates [PR121494]
Note this conflicts with my not yet approved patch for copy prop for aggregates into
function arguments (I will get back to that soon).
So the problem here is that I assumed if:
*a = decl1;
would not cause an exception that:
decl2 = *a;
would cause not cause one too.
I was wrong, in some cases where the Ada front-end marks `*a` in the store
as TREE_THIS_NOTRAP (due to knowing never be null or some other cases).
So that means when we prop decl1 into the statement storing decl2, we need to
mark that statement as possible to cleanup for eh.
Bootstraped and tested on x86_64-linux-gnu.
Also tested on x86_64-linux-gnu with a hack to force generate LC constant decls in the gimplifier.
PR tree-optimization/121494
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop): Mark the bb of the use
stmt if needed for eh cleanup.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Richard Biener [Mon, 11 Aug 2025 09:22:47 +0000 (11:22 +0200)]
Do not set STMT_VINFO_VECTYPE for non-dataref stmts
Now that all STMT_VINFO_VECTYPE uses from vectorizable_* have been
pruged there's no longer a need to have STMT_VINFO_VECTYPE set.
We still rely on it being present on data-ref stmts and there it
can differ between different SLP instances when doing BB vectorization.
The following removes the setting from vect_analyze_stmt and
vect_transform_stmt.
Note the following clears STMT_VINFO_VECTYPE from pattern stmts (the
vector type should have moved to the SLP tree by this time).
* tree-vect-stmts.cc (vect_analyze_stmt): Only set
STMT_VINFO_VECTYPE for dataref SLP representatives.
Clear it for others and do not restore the original value.
(vect_transform_stmt): Likewise.
Richard Biener [Mon, 11 Aug 2025 13:38:41 +0000 (15:38 +0200)]
Pass down vector type to avoid STMT_VINFO_VECTYPE on reduc-info
The following passes down the vector type to functions instead of
querying it from the reduc-info stmt-info.
* tree-vect-loop.cc (get_initial_defs_for_reduction):
Get vector type as argument.
(vect_find_reusable_accumulator): Likewise.
(vect_transform_cycle_phi): Adjust.
Richard Biener [Mon, 11 Aug 2025 09:20:41 +0000 (11:20 +0200)]
Do not use STMT_VINFO_VECTYPE in vectorizable_reduction
There's one use of STMT_VINFO_VECTYPE in vectorizable_reduction
where I'm only 99% sure which SLP_TREE_VECTYPE to replace it with
(vectorizable_reduction needs a lot of post-only-SLP TLC). The
following replaces it with the hopefully appropriate one.
* tree-vect-loop.cc (vectorizable_reduction): Replace
STMT_VINFO_VECTYPE use with SLP_TREE_VECTYPE.
Richard Biener [Mon, 11 Aug 2025 08:42:47 +0000 (10:42 +0200)]
tree-optimization/121493 - another missed VN with aggregate copy
This is another case where opportunistically handling a first
aggregate copy where we failed to match up the refs exactly
(as we don't insert missing handling components) yields to a
failure in the second aggregate copy that we visit. Add another
fixup to deal with such situations, in-line with that present
opportunistic handling.
PR tree-optimization/121493
* tree-ssa-sccvn.cc (vn_reference_lookup_3): Opportunistically
strip components with known offset.
Yuao Ma [Thu, 7 Aug 2025 14:35:17 +0000 (22:35 +0800)]
fortran: add optional lower arg to c_f_pointer
This patch adds support for the optional lower argument in intrinsic
c_f_pointer specified in Fortran 2023. Test cases and documentation have also
been updated.
* gfortran.dg/c_f_pointer_shape_tests_7.f90: New test.
* gfortran.dg/c_f_pointer_shape_tests_8.f90: New test.
* gfortran.dg/c_f_pointer_shape_tests_9.f90: New test.
Shreya Munnangi [Tue, 12 Aug 2025 03:42:50 +0000 (21:42 -0600)]
Improve initial code generation for addsi/adddi
This is a patch primarily from Shreya, though I think she cribbed some
code from Philipp that we had internally within Ventana and I made some
minor adjustments as well.
So the basic idea here is similar to her work on logical ops --
specifically when we can generate more efficient code at expansion time,
then do so. In some cases the net is better code; in other cases we
lessen reliance on mvconst_internal and finally it provides
infrastructure that I think will help address an issue Paul Antoine
reported a little while back.
The most obvious case is using paired addis from initial code generation
for some constants. It will also use a shNadd insn when the cost to
synthesize the original value is higher than the right-shifted value.
Finally it will negate the constant and use "sub" if the negated
constant is cheaper than the original constant.
There's more work to do in here, particularly WRT 32 bit objects for
rv64. Shreya is looking at that right now. There may also be cases
where another shNadd or addi would be profitable. We haven't really
explored those cases in any detail, while there may be cases to handle,
it's unclear how often they occur in practice.
I don't want to remove the define_insn_and_split for the paired addi
cases yet. I think that likely happens as a side effect of fixing Paul
Antoine's issue.
Bootstrapped and regression tested on a BPI & Pioneer box. Will
obviously wait for the pre-commit tester before moving forward.
Jeff
PR target/120603
gcc/
* config/riscv/riscv-protos.h (synthesize_add): Add prototype.
* config/riscv/riscv.cc (synthesize_add): New function.
* config/riscv/riscv.md (addsi3): Allow any constant as operands[2]
in the expander. Force the constant into a register as needed for
TARGET_64BIT. Use synthesize_add for !TARGET_64BIT.
(*adddi3): Renamed from adddi3.
(adddi3): New expander. Use synthesize_add.
gcc/testsuite
* gcc.target/riscv/add-synthesis-1.c: New test.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com> Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Robert Dubner [Tue, 12 Aug 2025 00:56:38 +0000 (20:56 -0400)]
cobol: Bring EBCDIC NumericDisplay variables into IBM compliance.
The internal representation of Numeric Display (ND) zoned decimal variables
when operating in EBCDIC mode has been brought into compliance with IBM
conventions. This requires changes to data input, data output, internal
conversion of zoned decimal to binary, and variable assignment.
gcc/testsuite:
* gcc.target/aarch64/cmpbr-3.c: New.
* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: Simplify
test for csel by ignoring the actual registers used.
Restrict the immediate range to the intersection of LT/GE and GT/LE
so that cfglayout can invert the condition to redirect any branch.
gcc:
PR target/121388
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the
range of LT/GE and GT/LE to their intersections.
* config/aarch64/aarch64.md (*aarch64_cb<INT_CMP><GPI>): Unexport.
Use cmpbr_imm_predicate instead of aarch64_cb_rhs.
* config/aarch64/constraints.md (Uc1): Accept 0..62.
(Uc2): Remove.
* config/aarch64/iterators.md (cmpbr_imm_predicate): New.
(cmpbr_imm_constraint): Update to match aarch64_cb_rhs.
* config/aarch64/predicates.md (aarch64_cb_reg_i63_operand): New.
(aarch64_cb_reg_i62_operand): New.
The simplest solution is to remove the clobber from aarch64_tbz.
This removes the possibility of expansion via TST+B.cond, which
will merely fall back to TBNZ+B on shorter branches.
gcc:
PR target/121385
* config/aarch64/aarch64.md (*aarch64_tbz<LTGE><ALLI>1): Remove
cc clobber and expansion via TST+Bcond.
Both patterns used !reload_completed as a condition, which is
questionable at best. The branch pattern failed to include a
clobber of CC_REGNUM. Both problems were unlikely to trigger
in practice, due to how the optimization pipeline is organized,
but let's fix them anyway.
gcc:
* config/aarch64/aarch64.cc (aarch64_gen_compare_split_imm24): New.
* config/aarch64/aarch64-protos.h: Update.
* config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Use it.
Add match_scratch and cc clobbers. Use match_operator instead of
iterator expansion.
(*compare_cstore<GPI>_insn): Likewise.
Two of the three uses of aarch64_imm24 included the important follow-up
tests vs aarch64_move_imm and aarch64_plus_operand. Lack of the exclusion
within aarch64_if_then_else_costs produced incorrect costing.
Since aarch64_split_imm24 has already matched a non-negative CONST_INT,
drill down from aarch64_plus_operand to aarch64_uimm12_shift.
gcc:
* config/aarch64/predicates.md (aarch64_split_imm24): Rename from
aarch64_imm24; exclude aarch64_move_imm and aarch64_uimm12_shift.
* config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>):
Update for aarch64_split_imm24.
(*compare_cstore<GPI>_insn): Likewise.
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Likewise.
The save/restore_stack_nonlocal patterns passed a DImode rtx to
gen_tbranch_neqi3 for a QImode compare. But since we're seeding
r16 with 1, GCSEnabled will clear the only set bit in r16, so we
can use CBNZ instead of TBNZ.
gcc:
* config/aarch64/aarch64.md (tbranch_<EQL><SHORT>3): Remove.
(save_stack_nonlocal): Use aarch64_gen_compare_zero_and_branch.
(restore_stack_nonlocal): Likewise.
gcc/testsuite:
* gcc.target/aarch64/gcs-nonlocal-3.c: Match cbnz.
gcc:
* config/aarch64/aarch64.cc (aarch64_if_the_else_costs): Reorg to
include the cost of inner within TBZ sign-bit test, only match
CBZ/CBNZ with valid modes, and both for the aarch64_imm24 test.
Nicolas Werner [Sun, 3 Aug 2025 14:38:08 +0000 (16:38 +0200)]
c++: Quoting in -fmodules-mapper
Users might be using a space in their build directory path. To allow
specifying such a root for the module mapper started by GCC, we need the
command to allow quotes. Previously quoting a path passed to the module
mapper was not possible, so replace the custom argv parsing with
the argv parsing logic from libiberty, that supports fairly standard
shell quoting using single and double quotes.
The primary purpose of this patch is to allow passing paths with spaces
to the --root parameter of the module mapper.
No test is included as spaces in build directories are tricky cross
platform. The patch was tested manually on my system.
Paul Thomas [Mon, 11 Aug 2025 20:34:07 +0000 (21:34 +0100)]
Fortran: gfortran rejects procedure binding on PDT [PR121398]
2025-08-11 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/121398
* resolve.cc (check_pdt_args): New function.
(check_generic_tbp_ambiguity): Use it to ensure that args to
typebound procedures that do not have the same declared type as
the containing derived type have 'pass1/2' set to null. This
avoids false ambiguity errors.
(resolve_typebound_procedure): Do not generate a wrong type
error for typebound procedures marked as pass if they are of a
different declared type to the containing pdt_type.
gcc/testsuite/
PR fortran/121398
* gfortran.dg/pdt_generic_1.f90: New test.
Iain Sandoe [Mon, 11 Aug 2025 19:41:15 +0000 (20:41 +0100)]
D: Adjust the code-gen for a string constant.
In this function, we are generating a string constant but do so with
a mismatch between the actual string length and the length specified
in the type. This causes Darwin, at least, to place the string in an
unexpected section (since the parameters do not match, it is rejected
as a cstring).
Use build_string_literal() to construct a consistent null-terminated
string.
gcc/d/ChangeLog:
* d-codegen.cc (build_filename_from_loc): Use
build_string_literal() to build a null-terminated string for
the filename.
Andrew Pinski [Wed, 30 Jul 2025 23:44:56 +0000 (16:44 -0700)]
forwprop: Recongize a store of integral zero for optimize_aggr_zeroprop.
While looking into the gimple level after optimization of the highway code
from google, I noticed in .optimized we still have:
```
MEM <vector(8) short int> [(short int *)&a] = { 0, 0, 0, 0, 0, 0, 0, 0 };
D.4398 = a;
a ={v} {CLOBBER(eos)};
D.4389 = D.4398;
D.4390 = D.4389;
D.4361 = D.4390;
D.4195 = D.4361;
return D.4195;
```
Note this is with SRA disabled since I noticed there is better code generation with
SRA disabled but that is a different story and I will get to that later on.
Which could be just optimized to a single store of `{}` .
The reason why the optimize_agr_copyprop does not handle the above is there was clobbers
inbetween the store in the last forwprop pass and currently don't copy after the first use.
While optimize_aggr_zeroprop does handle copying over clobbers just fine.
So this allows the recognization of the store to a to be like a memset to optimize_aggr_zeroprop
and then the result just falls through.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_aggr_zeroprop): Recognize stores
of integer_zerop as memset of 0.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/copy-prop-aggr-zero-1.c: New test.
* gcc.dg/torture/copy-prop-aggr-zero-2.c: New test.
* gcc.dg/tree-ssa/copy-prop-aggregate-zero-1.c: New test.
* gcc.dg/tree-ssa/copy-prop-aggregate-zero-2.c: New test.
* gcc.dg/tree-ssa/copy-prop-aggregate-zero-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jeff Law [Mon, 11 Aug 2025 14:13:51 +0000 (08:13 -0600)]
Don't run tests requiring "B" on designs without "B"
So I resurrected our milkv pioneer over the weekend. While we had the
tell-tale signs of PCIE switch issues, it actually appears that the NMVE drive
was failing. I had an NVME that was going to be installed in a different
system, so I threw it into the Pioneer as a last ditch effort to get it
functional again. Voila! It's a happy camper (so far).
Naturally I don't like manual testing, so cobbled together a new target for my
tester. I forgot to update one field when doing that and as a result it picked
up testsuite prior test results from the job that runs on the BPI.
So comparing test results from a BPI to the Pioneer wouldn't normally be
interesting. We'd expect to see a whole bunch of tests disappear as the
Pioneer doesn't have all kinds of extensions that the BPI does (and that does
indeed happen).
As it turns out we have a handful of tests which need bitmanip to run, but
which don't restrict themselves to only run on appropriate hardware. So we
might as well fix that.
Given the Pioneer/BPI take 6/24 hours to cycle through respectively I just spot
checked the testsuite changes. Pushing to the trunk.
gcc/
* doc/sourcebuild.texi: Add riscv_b_ok and riscv_v_ok target selectors.
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_riscv_b_ok): New.
* gcc.target/riscv/pr116085.c: Use new target selector.
* gcc.target/riscv/pr117690.c: Use new target selector.
* gcc.target/riscv/pr120333.c: Use new target selector.
* gcc.target/riscv/zba-shNadd-10.c: Use new target selector.
Richard Biener [Mon, 11 Aug 2025 07:42:09 +0000 (09:42 +0200)]
tree-optimization/121488 - improve BIT_FIELD_REF lookup in VN
When a BIT_FIELD_REF lookup combined with a defining load RHS results
in a wrongly typed result, try looking up or inserting a VIEW_CONVERT_EXPR
to the desired type.
PR tree-optimization/121488
* tree-ssa-sccvn.cc (visit_nary_op): If the BIT_FIELD_REF
result is of wrong type, try a VIEW_CONVERT_EXPR around it.
Pan Li [Fri, 1 Aug 2025 02:46:58 +0000 (10:46 +0800)]
RISC-V: Add testcase for scalar unsigned SAT_MUL form 2
Add run and asm check test cases for scalar unsigned
SAT_MUL form 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat/sat_u_mul-3-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-3-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-3-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-3-u8.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-3-u16.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-3-u32.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-3-u64.c: New test.
* gcc.target/riscv/sat/sat_u_mul-run-3-u8.c: New test.
Based on the help of match.pd pattern match, the widening-mul will
try to convert it to below:
_3 = .SAT_MUL (a_4(D), b_5(D));
gcc/ChangeLog:
* tree-ssa-math-opts.cc (match_saturation_mul): Add new func
to emit IFN_SAT_MUL if matched.
(math_opts_dom_walker::after_dom_children): Try to match
the phi node for SAT_MUL.
Pan Li [Wed, 6 Aug 2025 14:13:26 +0000 (22:13 +0800)]
RISC-V: Refactor the vec_duplicate cost on gpr/fpr2vr-cost param
The previous cost value for vec_duplicate almost bases on the operators
like add/minus. The rtx_cost function try to match them case by case
and find if it has vec_duplicate, then update the cost values.
It is Ok when we initially add it but looks confused/redundant as more
and more operators are involved. As Robin's suggestion, we only care
about the sub-rtx has vec_duplicate or not, instead of take care of
it by operators.
Thus, this PR would like to refactor that and get rid of the operators
when compute the vec_duplicate cost.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Remove.
(riscv_rtx_costs): Refactor to serach vec_duplicate on the
sub rtx.
simplify-rtx: Distribute some non-narrowing subregs [PR121306]
In g:965564eafb721f8000013a3112f1bba8d8fae32b I'd added code
to try distributing non-widening subregs through logic ops,
in cases where that would eliminate a term of the logic op.
For "reasons", this indirectly caused combine to generate:
for some tests that were intended to match x86's *one_cmplqi_ext<mode>_1
(see g:a58d770fa1d17ead3c38417b299cce3f19f392db). However, other more
direct ways of generating the pattern continued to have the unsimplified
(subreg:SI (not:QI (subreg:QI (...:SI ...)))) structure, since that
structure wasn't the focus of the original patch.
This patch tries to tackle that simplification head-on. It's another
case of distributing subregs, but this time for non-narrowing rather
than non-widening subregs. We already do the same distribution for
word_mode:
which g:0340177d54d08b6375391ba164a878e6a596275e extended to NOT.
For word_mode, there are (reasonably) no restrictions on the inner
mode other than that it is an integer. Doing word_mode logic ops
should be at least as efficient as subword logic ops (if the target
provides subword ops at all). And word_mode logic ops should be
cheaper than multi-word logic ops.
But here we need the distribution for SImode rather than word_mode
(DImode). The patch therefore extends the word_mode distributions
to non-narrowing subregs in which the two modes occupy the same
number of words. This should hopefully be relatively conservative.
It prevents the new rule from going away from word_mode, and attempting
to convert (say) a QImode subreg of a word_mode AND into a QImode AND.
It should be suitable for both CISCy and RISCy targets, including
those that define WORD_REGISTER_OPERATIONS.
The patch also fixes some overlong lines in related code.
gcc/
PR rtl-optimization/121306
* simplify-rtx.cc (simplify_context::simplify_subreg): Distribute
non-narrowing integer-to-integer subregs through logic ops,
in a similar way to the existing word_mode handling.
Jakub Jelinek [Mon, 11 Aug 2025 07:02:38 +0000 (09:02 +0200)]
c++: Fix up handling of name independent structured binding packs [PR117783]
I've realized I haven't added testsuite coverage for name independent
structured binding packs. And the
auto [i, ..._, j] = T {};
auto [k, ..._, l] = T {};
case shows a problem with that. The elements of the structured binding
pack have #i appended to their names, so for the _ case e.g. _#0, _#1
etc. (to print something useful in diagnostics, perhaps debug info later
on). The above is valid though as long as one doesn't use _ (which is
ambiguous), but we were emitting errors on redeclaration of _#0, _#1
etc.
The following patch uses DECL_NAME (decl) = NULL_TREE; for the
name independent decl case so that the false positive redeclaration
errors aren't emitted.
2025-08-11 Jakub Jelinek <jakub@redhat.com>
PR c++/117783
* decl.cc (set_sb_pack_name): For name independent decls
just clear DECL_NAME instead of appending #i to it.
* g++.dg/cpp26/name-independent-decl11.C: New test.
Jakub Jelinek [Mon, 11 Aug 2025 06:54:57 +0000 (08:54 +0200)]
c++: Implement mangling for structured binding packs [PR117783]
On Wed, Aug 06, 2025 at 11:53:55AM -0700, Jason Merrill wrote:
> The Clang mangling of the underlying variable seems fine, just mentioning
> the bound names; we can't get mangling collisions between pack and non-pack
> versions of the same name.
>
> But It looks like they use .N discriminators for the individual elements,
> which is wrong because . is reserved for implementation details. But I'd
> think it should be fine to use [<discriminator>] instead.
If you want the whole structured bindings to be mangled normally as if the
pack isn't a pack and the individual vars of the structured binding pack
mangled as multiple occurrences of the named entities, the following
patch does that.
2025-08-11 Jakub Jelinek <jakub@redhat.com>
PR c++/117783
* decl.cc (cp_finish_decomp): Don't sorry on tuple static
structured bindings with a pack, instead temporarily reset
DECL_NAME of the individual vars in the pack to the name
of the pack for cp_finish_decl time and force mangling.
* g++.dg/cpp26/decomp19.C: Don't expect sorry on tuple static
structured bindings with a pack.
* g++.dg/cpp26/decomp26.C: New test.
My C++26 P2686R4 PR117784 caused ICE on the following testcase.
While the earlier conditions guarantee decl2 is not error_mark_node,
decl can be (that is used when something erroneous has been seen earlier
and the whole structured bindings will be ignored after parsing).
So, the following patch avoids the copying of constexpr/constinit flags
if decl is error_mark_node.
2025-08-11 Jakub Jelinek <jakub@redhat.com>
PR c++/121442
* parser.cc (cp_parser_decomposition_declaration): Don't copy
DECL_DECLARED_CONST{EXPR,INIT}_P bits from decl to decl2 if
decl is error_mark_node.
Matthew Fortune [Sun, 10 Aug 2025 17:20:30 +0000 (11:20 -0600)]
Add -mgrow-frame-downwards
Grow the local frame down instead of up for mips16 code size.
By growing the frame downwards we get spill slots created at the lowest
address rather than highest address in a local frame. The benefit being
that when the frame is large the spill slots can still be accessed using
a 16bit instruction whereas it is less important for large local
variables to be accessed using short instructions as they are (probably)
accessed less frequently.
This is default on for MIPS16.
gcc/
* config/mips/mips.h (FRAME_GROWS_DOWNWARD) Allow the frame to
grow downwards for mips16 when -mgrow-frame-downwards is set.
* config/mips/mips.opt: Add -mgrow-frame-downwards option.
Andrew Pinski [Thu, 7 Aug 2025 19:18:38 +0000 (12:18 -0700)]
varasm: Redo mergeable section support [PR121438]
We increased the switch conversion array decl alignment
for better mergeability but it turns out that we increase
the alignment on targets which don't support mergeable sections
(e.g. NVPTX). Also after the fix for PR 121394, it becomes
obvious that we can place any sized into the mergeable section
instead of increasing the alignment.
This implements that and now also fixes PR 121438 as we don't
need to increase the alignment for the mergeable decls that
were being created by the C++ front-end.
* output.h (MAX_ALIGN_MERGABLE): Rename to ...
(MAX_MERGEABLE_BITSIZE): This.
* tree-switch-conversion.cc (switch_conversion::build_one_array): Don't
increase the alignment.
* varasm.cc (mergeable_string_section): Use MAX_MERGEABLE_BITSIZE
instead of MAX_ALIGN_MERGABLE. Also replace `/ 8` with `/ BITS_PER_UNIT`.
(mergeable_constant_section): Select the mergeable section based on
the bitsize rather than the alignment. Make sure the align is less
than the entity size.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Thu, 7 Aug 2025 18:44:21 +0000 (11:44 -0700)]
varasm: Ensure each variable in mergeable section is the entity size [PR121394]
Now there are mergeable sections which have an entity size, we can place
decls (constants) that are smaller in size in these sections.
An example is an `long double` which has a size of 12 bytes on i686 and is
placed in the 16 bytes shareable section.
For an example with the following C++ code:
```
std::initializer_list<long double> a = {0.3l};
```
We place the constant array in the .rodata.cst16 section but we don't add a padding to 16 bytes.
```
.section .rodata.cst16,"aM",@progbits,16
.align 16
.type _ZGR1a_, @object
.size _ZGR1a_, 12
_ZGR1a_:
.long -1717986918
.long -1717986919
.long 16381
.text
```
GAS has a workaround added to do the padding but other assemblers don't.
The gas workaround was added with https://sourceware.org/legacy-ml/binutils/2002-11/msg00615.html .
Now for the constant pool, GCC does emit a `.align` to padd out the size correctly.
This was done in r0-46282-gf41115930523b3.
The same padding should be done when emitting a variable contents when in a mergeable section.
This patch implements the padding and we now get an addition `.zero 4` which pads out the section.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/121394
gcc/ChangeLog:
* varasm.cc (assemble_variable_contents): Pad out
mergeable sections if needed.
(output_constant_pool_1): Change the padding to be explicit
zeroing for mergeable sections.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Ijaz, Abdul B [Sun, 10 Aug 2025 14:33:30 +0000 (08:33 -0600)]
config: Handle dash in library name for AC_LIB_LINKAGEFLAGS_BODY
For a library with dash in the name like yaml-cpp the AC_LIB_LINKAGEFLAGS_BODY
function generates a with_libname_type argument variable name with a dash but
this results in configure error. Since dashes are not allowed in the variable
name.
This change handles such cases and in case input library for the
AC_LIB_HAVE_LINKFLAGS has dash then it replaces it with the underscore "_".
Example of an error for yaml-cpp library before the change using gcc config
scripts in gdb:
gdb/gdb/configure: line 22868: with_libyaml-cpp_type=auto: command not found
After having underscore for this variable name:
checking whether to use yaml-cpp... yes
checking for libyaml-cpp... yes
checking how to link with libyaml-cpp... -lyaml-cpp
config/ChangeLog:
* lib-link.m4: Handle dash in the library name for
AC_LIB_LINKFLAGS_BODY.
testsuite: handle-multiline-outputs must allow both cc1 and cc1.exe
Prior to 14-2027-g985d6480fe5, the input text had the file extensions
pruned. In 14-2027-g985d6480fe5, due to the move of the call, the
pruning is never done. This change restores the pruning of the file
extension to allow multiline test to pass on both Windows and other
platforms like Linux.
Iain Sandoe [Thu, 7 Aug 2025 16:51:14 +0000 (17:51 +0100)]
Darwin: Anchor block internal symbols must not be linker-visible.
When we are using section anchors, there's a requirement that the
sequence of the content is an unbroken block. If we allow linker-
visible symbols in that block, ld(64) would be able to break it
into sub-sections on those symbol boundaries.
Do not allow symbols that should be visible to be anchored.
Do not make anchor block internal symbols linker-visible.
gcc/ChangeLog:
* config/darwin.cc (darwin_encode_section_info): Do not
make anchored symbols linker-visible.
(darwin_use_anchors_for_symbol_p): Disallow anchoring on
symbols that must be linker-visible (or external), even
if the definitions are in this TU.
Iain Sandoe [Wed, 6 Aug 2025 05:07:43 +0000 (06:07 +0100)]
Darwin: Section anchors must be linker-visible.
In principle, these begin (or at least delineate) a region that
could be split by the static linker. If the symbols are hidden
to newer linkers they produce diagnostics about the temporary
symbol generated.
gcc/ChangeLog:
* config/darwin.h (ASM_GENERATE_INTERNAL_LABEL): New
entry for LANCHOR.
../../gcc/gcc/diagnostics/dumping.cc:98:1: error: redefinition of ‘void diagnostics::dumping::emit_field(FILE*, int, const char*, T) [with T = unsigned int; FILE = FILE]’
98 | emit_field<unsigned> (FILE *outfile, int indent,
| ^~~~~~~~~~~~~~~~~~~~
../../gcc/gcc/diagnostics/dumping.cc:80:1: note: ‘void diagnostics::dumping::emit_field(FILE*, int, const char*, T) [with T = unsigned int; FILE = FILE]’ previously declared here
80 | emit_field<size_t> (FILE *outfile, int indent,
| ^~~~~~~~~~~~~~~~~~
Sorry about this.
Should be fixed by the following patch, which avoids templates here
in favor of being explicit about types, avoids the use of "%zi" with
fprintf in various places, and fixes some other minor issues in the
dumping logic that I noticed whilst testing the patch.
gcc/ChangeLog:
* diagnostics/context.cc (context::dump): Bulletproof against
m_reference_printer being null.
* diagnostics/dumping.cc (emit_field<const char *>): Replace
with...
(emit_string_field): ...this.
(emit_field<char *>): Eliminate.
(emit_field<bool>): Replace with...
(emit_bool_field): ...this.
(emit_field<size_t>): Replace with...
(emit_size_t_field): ...this, and use HOST_SIZE_T_PRINT_DEC rather
than %zi in fprintf call.
(emit_field<int>): Replace with...
(emit_int_field): ...this.
(emit_field<unsigned>): Replace with...
(emit_unsigned_field): ...this.
* diagnostics/dumping.h (emit_field): Replace this template decl
with...
(emit_string_field): ...this,
(emit_bool_field): ...this,
(emit_size_t_field): ...this,
(emit_int_field): ...this,
(emit_unsigned_field): ... and this.
(DIAGNOSTICS_DUMPING_EMIT_FIELD): Rename to...
(DIAGNOSTICS_DUMPING_EMIT_BOOL_FIELD): ...this and update for
above change.
* diagnostics/file-cache.cc (file_cache_slot::dump): Replace
emit_field calls with calls that explicitly state the type. Fix
type of dump of m_missing_trailing_newline to use bool.
(file_cache_slot::dump): Use HOST_SIZE_T_PRINT_DEC rather than
%zi in fprintf call.
* diagnostics/html-sink.cc (html_generation_options::dump): Update
for macro renaming.
* diagnostics/sarif-sink.cc
(sarif_serialization_format_json::dump): Likewise.
(sarif_generation_options::dump): Likewise, and for function
renaming.
* diagnostics/text-sink.cc (text_sink::dump): Update for macro
renaming.
* libgdiagnostics.cc (diagnostic_manager_debug_dump_file): Use
HOST_SIZE_T_PRINT_DEC rather than %zi in fprintf call.
* pretty-print.cc: Include "diagnostics/dumping.h".
(pp_formatted_chunks::dump): Use it.
(get_url_format_as_string): New.
(pretty_printer::dump): Use diagnostics::dumping. Bulletproof
against m_buffer being null.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Paul Thomas [Sat, 9 Aug 2025 10:40:09 +0000 (11:40 +0100)]
Fortran: F2018 GENERIC statement is missing [PR121182]
2025-08-09 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/121182
* decl.cc (match_generic_stmt): New function based on original
gfc_match_generic but feeding namespace rather than typebound
generics.
(match_typebound_generic): Renamed original gfc_match_generic.
(gfc_match_generic): New function that selects between type
bound generic and other generic statements and calls one of the
above two functions as appropriate.
* parse.cc (decode_specification_statement): Allow generic
statements.
(parse_spec): Accept a generic statement in a specification
block.
gcc/testsuite/
PR fortran/121182
* gfortran.dg/generic_stmt_1.f90: New test.
* gfortran.dg/generic_stmt_2.f90: New test.
* gfortran.dg/generic_stmt_3.f90: New test.
* gfortran.dg/generic_stmt_4.f90: New test.
Andrew Pinski [Mon, 28 Jul 2025 03:37:55 +0000 (20:37 -0700)]
forwprop: Don't do copy-prop-aggregates from statements that could throw [PR120599]
In the testcase provided, currently we lose the landing pad for the exception that could
throw from the aggregate load as we remove one copy and the second statement where load
happens was not marked as throwable before so the landing pad for that internal throw is
now gone.
The fix is to ignore statements that could throw (internally or externally).
PR tree-optimization/120599
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop): Don't try to copy
from statements that throw.
gcc/testsuite/ChangeLog:
* g++.dg/torture/noncall-eh-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Mon, 9 Jun 2025 06:13:23 +0000 (23:13 -0700)]
forwprop: Change proping memset into memcpy into a forwprop rather than a backwalk
One thing I noticed while working on copy prop for aggregates is that we start with
a memcpy like statement and then walk backwards. This means we could have a few walks
backwards to see there was no statement for zeroing. Instead this changes the walk
backwards into a true forwprop. In the future we can expand to forwprop the zeroing
into say an function argument or something more than memcpy like statement.
This should speed up slightly the compile time performance since there will be less
memsets like statements than memcpy and there is only one walk forwards for memset like
staments instead of multiple walk backwards to find the memset.
Note this does add one extra improvement, the memcpy now does not need to have an address
as its dest argument; this could have been done before too but it was even more noticable
now because of the variable became only set so it was removed and the check was removed
as well.
There is also a fix on how ao_ref for the memset/memcpy is done, before it was just using
ao_ref_init which is wrong since it should instead of used ao_ref_init_from_ptr_and_size.
This part fixes PR 121422.
Changes since v1:
* v2: Add back limit on the walk which was missed in v1.
Move the call to get_addr_base_and_unit_offset outside
of the vuse loop.
* v3: Remove extra check before the call to optimize_aggr_zeroprop_1.
Fix setting up of ao_ref for memset (PR121422).
* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Remove.
(optimize_aggr_zeroprop_1): New function.
(optimize_aggr_zeroprop): New function.
(simplify_builtin_call): Don't call optimize_memcpy_to_memset
for memcpy but call optimize_aggr_zeroprop for memset.
(pass_forwprop::execute): Don't call optimize_memcpy_to_memset
for aggregate copies but rather call optimize_aggr_zeroprop
for aggregate stores.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118946-1.c: New test.
* gcc.dg/torture/pr121422-1.c: New test.
* gcc.dg/torture/pr121422-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 8 Jun 2025 17:51:02 +0000 (10:51 -0700)]
forwprop: Change optimize_agr_copyprop into forward walk instead of backwards
While thinking about how to implement the rest of the copy prop and makes sure not
to introduce some compile time problems, optimize_agr_copyprop should be changed
into a forwproping rather than looking backwards.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop): Change into a
forward looking (looking at vdef's uses) instead of a back
looking (vuse's def).
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
David Malcolm [Fri, 8 Aug 2025 20:55:46 +0000 (16:55 -0400)]
diagnostics: enable nested diagnostics by default [PR116253]
In GCC 15 I added an experimental nesting view in text sinks
for hierarchical diagnostics, such as C++ template problems.
This patch enables it for text sinks by default. The old
behavior can be restored via -fno-diagnostics-show-nesting,
which the patch also adds to -fdiagnostics-plain-output.
The patch does not yet enable it for text sinks in sarif-replay.
gcc/ChangeLog:
PR diagnostics/116253
* common.opt (fdiagnostics-show-nesting): New option.
(fdiagnostics-show-nesting-locations): New option.
(fdiagnostics-show-nesting-levels): New option.
* common.opt.urls: Regenerate.
* diagnostics/context.cc (context::set_show_nesting): New.
(context::set_show_nesting_locations): New.
(context::set_show_nesting_levels): New.
* diagnostics/context.h (context::set_show_nesting): New decl.
(context::set_show_nesting_locations): New decl.
(context::set_show_nesting_levels): New decl.
* diagnostics/html-sink.cc: Tweak comment.
* diagnostics/output-spec.cc (text_scheme_handler::make_sink):
Rename "experimental-nesting" to "show-nesting" and enable by
default. Rename "experimental-nesting-show-locations" to
"show-nesting-locations". Rename
"experimental-nesting-show-levels" to "show-nesting-levels".
* diagnostics/sink.h (sink::dyn_cast_text_sink): New.
* diagnostics/text-sink.h (text_sink::dyn_cast_text_sink): New.
* doc/invoke.texi: Add -fdiagnostics-show-nesting,
-fdiagnostics-show-nesting-locations, and
-fdiagnostics-show-nesting-levels. Update for changes to
output-spec.cc above.
* lto-wrapper.cc (merge_and_complain): Ignore
OPT_fdiagnostics_show_nesting,
OPT_fdiagnostics_show_nesting_locations, and
OPT_fdiagnostics_show_nesting_levels.
(append_compiler_options): Likewise.
(append_diag_options): Likewise.
* opts-common.cc (decode_cmdline_options_to_array): Add
"-fno-diagnostics-show-nesting" to -fdiagnostics-plain-output.
* opts.cc (common_handle_option): Handle the new options.
(gen_command_line_string): Ignore the new options.
* toplev.cc (general_init): Call set_show_nesting,
set_show_nesting_locations, and set_show_nesting_levels on
global_dc.
gcc/testsuite/ChangeLog:
PR diagnostics/116253
* g++.dg/concepts/nested-diagnostics-1-truncated.C: Update for
renamed keys to -fdiagnostics-set-output=text
* g++.dg/concepts/nested-diagnostics-1.C: Likewise.
* g++.dg/concepts/nested-diagnostics-2.C: Likewise.
* gcc.dg/plugin/diagnostic-test-nesting-no-show-nesting.c: New
test.
* gcc.dg/plugin/diagnostic-test-nesting-show-nesting.c: New test.
* gcc.dg/plugin/diagnostic-test-nesting-text-indented-show-levels.c:
Update for renamed keys to -fdiagnostics-set-output=text.
* gcc.dg/plugin/diagnostic-test-nesting-text-indented-unicode.c:
Likewise.
* gcc.dg/plugin/diagnostic-test-nesting-text-indented.c: Likewise.
* gcc.dg/plugin/plugin.exp: Add the new tests.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 8 Aug 2025 20:55:45 +0000 (16:55 -0400)]
diagnostics: revamp of dumping of "diagnostics" internal state
The diagnostics subsystem has a handy dump feature, usable
during debugging via
(gdb) call global_dc->dump ()
which prints copious amounts of information about the state
of the diagnostics subsystem to stderr.
This patch consolidates the implementation and extends it, adding
various per-sink data (generation options specific to each of text,
SARIF, and HTML).
No functional difference intended outside of the debugger.
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostics/dumping.o.
* diagnostics/buffering.cc: Include "diagnostics/dumping.h".
(buffer::dump): Reimplement using diagnostics::dumping.
* diagnostics/context.cc: Include "diagnostics/dumping.h".
(context::dump): Reimplement using diagnostics::dumping.
Use sink::dump_kind when listing the sinks.
(sink::dump): Reimplement using diagnostics::dumping.
(counters::dump): Likewise.
* diagnostics/dumping.cc: New file.
* diagnostics/dumping.h: New file.
* diagnostics/file-cache.cc: Include "diagnostics/dumping.h".
(file_cache::dump): Reimplement using diagnostics::dumping.
(file_cache_slot::dump): Likewise.
* diagnostics/html-sink.cc: Include "diagnostics/dumping.h".
(html_generation_options::dump): New.
(html_sink_buffer::dump): Reimplement using diagnostics::dumping.
(html_builder::dump): New.
(html_sink::dump): Reimplement using diagnostics::dumping.
Add dump of the html_builder.
(html_file_sink::dump): Replace with...
(html_file_sink::dump_kind): ...this.
(html_buffered_sink::dump_kind): New.
* diagnostics/html-sink.h (html_generation_options::dump): New
decl.
* diagnostics/sarif-sink.cc: Include "diagnostics/dumping.h".
(sarif_serialization_format_json::dump): New.
(sarif_builder::dump): New.
(sarif_sink_buffer::dump): Reimplement using diagnostics::dumping.
(sarif_sink::dump): Likewise. Add dump of the sarif_builder.
(sarif_stream_sink::dump_kind): New.
(sarif_file_sink::dump): Replace with...
(sarif_file_sink::dump_kind): ...this.
(get_dump_string_for_sarif_version): New.
(sarif_generation_options::dump): New.
(class buffered_sink): Rename to...
(class sarif_buffered_sink): ...this.
(sarif_buffered_sink::dump_kind): New.
* diagnostics/sarif-sink.h (sarif_serialization_format::dump):
New.
(sarif_serialization_format_json::dump): New decl.
(sarif_generation_options::dump): New decl.
* diagnostics/sink.h (sink::dump_kind): New.
* diagnostics/text-sink.cc: Include "diagnostics/dumping.h".
(text_sink_buffer::dump): Reimplement using diagnostics::dumping.
(text_sink::dump): Likewise. Emit fields m_show_nesting,
m_show_locations_in_nesting, and m_show_nesting_levels.
* diagnostics/text-sink.h (text_sink::dump_kind): New.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 8 Aug 2025 20:55:45 +0000 (16:55 -0400)]
diagnostics: minor cleanups
No functional change intended.
gcc/ChangeLog:
* diagnostic.h (diagnostics::get_cwe_url): Move decl to
diagnostics/metadata.h.
(diagnostics::maybe_line_and_column): Move into
diagnostics::text_sink.
* diagnostics/context.cc: Update for maybe_line_and_column
becoming a static member of text_sink.
* diagnostics/metadata.h (diagnostics::get_cwe_url): Move decl
here from diagnostic.h.
* diagnostics/text-sink.cc (maybe_line_and_column): Convert to...
(text_sink::maybe_line_and_column): ...this.
* diagnostics/text-sink.h (text_sink::maybe_line_and_column): Move
here from diagnostic.h.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 8 Aug 2025 20:55:43 +0000 (16:55 -0400)]
diagnostics: introduce struct column_options
Consolidate 3 fields in diagnostics::context and
diagnostics::column_policy into a new struct
diagnostics::column_options.
No functional change intended; reduces the number of public
fields in diagnostics::context.
gcc/c-family/ChangeLog:
* c-indentation.cc (should_warn_for_misleading_indentation):
Update for moving diagnostics::context::m_tabstop into
diagnostics::column_options.
* c-opts.cc (c_common_post_options): Likewise.
gcc/ChangeLog:
* diagnostics/column-options.h: New file, adding struct
diagnostics::column_options, taken from fields in
diagnostics::context and diagnostics::column_policy.
* diagnostics/context.cc (context::initialize): Update for moving
fields of diagnostics::context into diagnostics::column_options.
(column_policy::column_policy): Likewise.
(column_policy::converted_column): Move implementation to...
(column_options::convert_column): ...this new function.
(context::report_diagnostic): Update for moving fields of
diagnostics::context into diagnostics::column_options.
(assert_location_text): Likewise.
* diagnostics/context.h: Include "diagnostics/column-options.h".
(class column_policy): Replace fields m_column_unit,
m_column_origin, and m_tabstop with m_column_options.
(context::get_column_options): New accessors.
(context::m_column_unit): Move to struct column_options and
replace with m_column_options.
(context::m_column_origin): Likewise.
(context::m_tabstop): Likewise.
* diagnostics/sarif-sink.cc (sarif_builder::sarif_builder): Update
for moving fields of diagnostics::context into
diagnostics::column_options.
* diagnostics/source-printing.cc: Likewise.
* opts.cc (common_handle_option): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Robert Dubner [Fri, 8 Aug 2025 17:04:53 +0000 (13:04 -0400)]
cobol: Divide-and-conquer conversion from binary to packed-decimal.
The legacy routine for converting a binary integer to a packed-decimal
representaion peeled two digits at a time from the bottom of an _int128 value.
These changes replace that routine with a divide-and-conquer algorithm that
runs about ten times faster.
libgcobol/ChangeLog:
* libgcobol.cc (int128_to_field): Switch to the new routine.
* stringbin.cc (packed_from_combined): Implement the new routine.
(__gg__binary_to_packed): Likewise.
* stringbin.h (__gg__binary_to_packed): Likewise.
As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685733.html
the operand of the call should be a mem rather than an unspec.
This patch moves the unspec to an additional argument of the parallel
and adjusts cmse_nonsecure_call_inline_register_clear accordingly.
The scan-rtl-dump in cmse-18.c needs a fix since we no longer emit the
'unspec' part.
In addition, I noticed that since arm_v8_1m_mve_ok is always true in
the context of the test (we know we support CMSE as per cmse.exp, and
arm_v8_1m_mve_ok finds the adequate options), we actually only use the
more permissive regex. To improve that, the patch duplicates the
test, such that cmse-18.c forces -march=armv8-m.main+fp (so FPCXP is
disabled), and cmse-19.c forces -march=armv8.1-m.main+mve (so FPCXP is
enabled). Each test uses the appropriate scan-rtl-dump, and also
checks we are using UNSPEC_NONSECURE_MEM (we need to remove -slim for
that). The tests enable an FPU via -march so that the test passes
whether the testing harness forces -mfloat-abi or not.
Pengfei Li [Thu, 7 Aug 2025 14:52:45 +0000 (14:52 +0000)]
AArch64: Fix invalid immediate offsets in SVE gather/scatter [PR121449]
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE
gather/scatter with type widening/narrowing and vector-plus-immediate
addressing. The bug leads to below "immediate offset out of range"
errors during assembly, eventually causing compilation failures.
/tmp/ccsVqBp1.s: Assembler messages:
/tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]'
Current RTL patterns for such instructions incorrectly use vgw or vgd
constraints for the immediate operand, base on the vector element type
in Z registers (zN.s or zN.d). However, for gather/scatter with type
conversions, the immediate range for vector-plus-immediate addressing is
determined by the element type in memory, which differs from that in
vector registers. Using the wrong constraint can produce out-of-range
offset values that cannot be encoded in the instruction.
This patch corrects the constraints used in these patterns. A test case
that reproduces the issue is also included.
Bootstrapped and regression-tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/121449
* config/aarch64/aarch64-sve.md
(mask_gather_load<mode><v_int_container>): Use vg<Vesize>
constraints for alternatives with immediate offset.
(mask_scatter_store<mode><v_int_container>): Likewise.
gcc/testsuite/ChangeLog:
PR target/121449
* g++.target/aarch64/sve/pr121449.C: New test.
Richard Biener [Thu, 10 Jul 2025 09:37:14 +0000 (11:37 +0200)]
Remove setting of STMT_VINFO_VECTYPE on non-dataref stmts
The following removes early setting of STMT_VINFO_VECTYPE and as
side-effect early failing if we fail to compute a vector type. The
latter is now ensured by SLP build. The former is still temporarily
copied from the SLP tree during stmt analysis, and data reference
stmts will still have STMT_VINFO_VECTYPE given existing uses in
dependence and alignment analysis and peeling.
* tree-vect-loop.cc (vect_determine_vectype_for_stmt_1): Remove.
(vect_determine_vectype_for_stmt): Likewise.
(vect_set_stmts_vectype): Likewise.
(vect_analyze_loop_2): Do not call vect_set_stmts_vectype.
* tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Detect
irregular stmts early here.
Alex Coplan [Tue, 15 Jul 2025 10:49:27 +0000 (11:49 +0100)]
aarch64: Relax fpm_t assert to allow const_ints [PR120986]
This relaxes an overzealous assert that required the fpm_t argument to
be in DImode when expanding FP8 intrinsics. Of course this fails to
account for modeless const_ints.
Alex Coplan [Tue, 15 Jul 2025 09:37:58 +0000 (10:37 +0100)]
aarch64: Fix predication of FP8 FDOT insns [PR120986]
The predication of the SVE2 FP8 dot product insns was relying on the
architectural dependency:
FEAT_FP8DOT2 => FEAT_FP8DOT4
which was relaxed in GCC as of r15-7480-g299a8e2dc667e795991bc439d2cad5ea5bd379e2, thus leading to
unrecognisable insn ICEs when compiling a two-way FDOT with just
+fp8dot2. This patch introduces a new mode iterator which selectively
enables the appropriate mode(s) depending on which of the FP8DOT{2,4}
features are available, and uses it to fix the predication of the
patterns.
gcc/ChangeLog:
PR target/120986
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>):
Switch mode iterator from SVE_FULL_HSF to new iterator;
remove insn predicate as this is now taken care of by conditions
in the mode iterator.
(@aarch64_sve_dot_lane<mode>): Likewise.
* config/aarch64/iterators.md (SVE_FULL_HSF_FP8_FDOT): New.
gcc/testsuite/ChangeLog:
PR target/120986
* gcc.target/aarch64/pr120986-1.c: New test.
Richard Biener [Fri, 8 Aug 2025 06:57:58 +0000 (08:57 +0200)]
tree-optimization/121454 - ICE building libgo
The following avoids building BIT_FIELD_REFs of reference trees
that are unexpected by nonoverlapping_refs_since_match_p and
while being there also those declared invalid by IL verification.
Jakub Jelinek [Fri, 8 Aug 2025 07:20:51 +0000 (09:20 +0200)]
tailc: Handle other forms of finally_tmp.N conditional cleanups after musttail [PR121389]
My earlier r16-1886 PR120608 change incorrectly assumed that the
finally_tmp.N vars introduced by eh pass will be only initialized
to values 0 and 1 and there will be only EQ_EXPR/NE_EXPR comparisons
of those.
The following testcases show that is a bad assumption, the eh pass
sets finally_tmp.N vars to 0 up to some highest index depending on
hoiw many different exits there are from the finally region.
And it emits then switch (finally_tmp.N) statement for all the
different cases. So, if it uses more than 0/1 indexes, the lowering
of the switch can turn it into a series of GIMPLE_CONDs,
if (finally_tmp.N_M > 15)
goto ...
else
goto ...
if (finally_tmp.N_M > 7)
goto ...
else
goto ...
etc. (and that also means no longer single uses). And if unlucky,
we can see a non-lowered GIMPLE_SWITCH as well.
So, the following patch removes the assumption that it has to be 0/1
and EQ_EXPR/NE_EXPR, allows all the normal integral comparisons
and handles GIMPLE_SWITCH too.
2025-08-08 Jakub Jelinek <jakub@redhat.com>
PR middle-end/121389
* tree-tailcall.cc (find_tail_calls): For finally_tmp.N
handle not just GIMPLE_CONDs with EQ_EXPR/NE_EXPR and only
values 0 and 1, but arbitrary non-negative values, arbitrary
comparisons in conditions and also GIMPLE_SWITCH next to
GIMPLE_CONDs.
* c-c++-common/asan/pr121389-1.c: New test.
* c-c++-common/asan/pr121389-2.c: New test.
* c-c++-common/asan/pr121389-3.c: New test.
* c-c++-common/asan/pr121389-4.c: New test.
Richard Biener [Thu, 7 Aug 2025 12:57:09 +0000 (14:57 +0200)]
Modernize vectorizable_lane_reducing
The following avoids STMT_VINFO_VECTYPE usage in
vect_is_emulated_mixed_dot_prod and makes sure to register the SLP
node when costing in vectorizable_lane_reducing.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Get
the SLP node rather than the stmt_info.
(vectorizable_lane_reducing): Adjust, pass SLP node to costing.
(vect_transform_reduction): Adjust.
Richard Biener [Thu, 7 Aug 2025 12:45:52 +0000 (14:45 +0200)]
Pass SLP node to promotion/demotion costing
This one was forgotten. Also constants/externals are costed explicitly
with SLP.
* tree-vect-stmts.cc (vect_model_promotion_demotion_cost): Pass
in SLP node and drop unused dr argument. Use SLP node for
costing, drop costing of constant/external operands.
(vectorizable_conversion): Adjust.
Robert Dubner [Thu, 7 Aug 2025 19:52:02 +0000 (15:52 -0400)]
cobol: Improve binary-to-string conversion.
COBOL often requires the conversion of binary integers to string of characters.
These changes replace a naive routine that peels decimal digits from a binary
value one digit at a time, with a divide-and-conquer algorithm that is twice as
fast even for a couple of digits, and is about eight times faster past ten
digits.
Included here are some minor fixes to the lexer and parser.
* Makefile.am: Include new stringbin.cc file.
* Makefile.in: Regenerated.
* libgcobol.cc (__gg__power_of_ten): Improve error message.
(__gg__binary_to_string): Deleted.
(__gg__binary_to_string_internal): Deleted.
(int128_to_field): Use new conversion routine.
(__gg__move): Use new conversion routine.
* stringbin.cc: New file. Implements new conversion routine.
* stringbin.h: New file. Likewise.
Patrick Palka [Thu, 7 Aug 2025 19:43:22 +0000 (15:43 -0400)]
c++: extract_call_expr and C++20 rewritten ops
After r16-2519-gba5a6787374dea, we'll never express a C++20 rewritten
comparison operator as a built-in operator acting on an operator<=>
call, e.g. operator<=>(x, y) @ 0. This is because operator<=> always
returns a class type (std::foo_ordering), so the outer operator@ will
necessarily resolve to a non-built-in operator@ for that class type
(even in the non-dependent templated case, after that commit).
So the corresponding handling in extract_call_expr is basically dead
code, except for the TRUTH_NOT_EXPR case where we can plausibly still
have !(operator==(x, y)), but it doesn't make sense to recognize just
that one special case of operator rewriting. So let's just remove it
altogether; apparently it's no longer needed.
Also, the handling is imprecise: it recognizes expressions such as
0 < f() which never corresponded to a call in the first place. All
the more reason to remove it.
gcc/cp/ChangeLog:
* call.cc (extract_call_expr): Remove handling of C++20
rewritten comparison operators.
Jakub Jelinek [Thu, 7 Aug 2025 14:38:51 +0000 (16:38 +0200)]
c++: Implement C++26 P1061R10 - Structured Bindings can introduce a Pack [PR117783]
The following patch implements the C++26
P1061R10 - Structured Bindings can introduce a Pack
paper.
One thing unresolved in the patch is mangling, I've raised
https://github.com/itanium-cxx-abi/cxx-abi/issues/200
for that but no comments there yet. One question is if it is ok
not to mention the fact that there is a structured binding pack in
the mangling of the structured bindings but more important is in case
of std::tuple* we might need to mangle individual structured binding
pack elements separately (each might need an exported name for the
var itself and perhaps its guard variable as well). The patch just
uses the normal mangling for the whole structured bindings and emits
sorry if we need to mangle the structured binding pack elements.
The patch just marks the structured binding pack specially (considered
e.g. using some bit on it, but in the end I'm identifying it using
a made up type which causes DECL_PACK_P to be true; it is kind of
self-referential solution, because the type on the pack mentions the
DECL_DECOMPOSITION_P VAR_DECL on which the type is attached as its pack,
so it needs to be handled carefully during instantiation to avoid infinite
recursion, but it is the type that should be used if something else actually
needs to use the same type as the structured binding pack, e.g. a capture
proxy), and stores the pack elements when actually processed through
cp_finish_decomp with non-dependent initializer into a TREE_VEC used as
DECL_VALUE_EXPR of the pack; though because several spots use the
DECL_VALUE_EXPR and assume it is ARRAY_REF from which they can find out the
base variable and the index, it stores the base variable and index in the
first 2 TREE_VEC elts and has the structured binding elements only after
that.
https://eel.is/c++draft/temp.dep.expr#3.6 says the packs are type dependent
regardless of whether the initializer of the structured binding is type
dependent or not, so I hope having a dependent type on the structured
binding VAR_DECL is ok.
The paper also has an exception for sizeof... which is then not value
dependent when the structured bindings are initialized with non-dependent
initializer: https://eel.is/c++draft/temp.dep.constexpr#4
The patch special cases that in 3 spots (I've been wondering if e.g. during
parsing I couldn't just fold the sizeof... to the INTEGER_CST right away,
but guess I'd need to repeat that also during partial instantiation).
And one thing still unresolved is debug info, I've just added DECL_IGNORED_P
on the structured binding pack VAR_DECL because there were ICEs with -g
for now, hope it can be fixed incrementally but am not sure what exactly
we should emit in the debug info for that.
Speaking of which, I see
DW_TAG_GNU_template_parameter_pack
DW_TAG_GNU_formal_parameter_pack
etc. DIEs emitted regardless of DWARF version, shouldn't we try to upstream
those into DWARF 6 or check what other compilers emit for the packs?
And bet we'd need DW_TAG_GNU_structured_binding_pack as well.
2025-08-07 Jakub Jelinek <jakub@redhat.com>
PR c++/117783
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_structured_bindings
predefined value for C++26 from 202403L to 202411L.
gcc/cp/
* parser.cc: Implement C++26 P1061R10 - Structured Bindings can
introduce a Pack.
(cp_parser_range_for): Also handle TREE_VEC as DECL_VALUE_EXPR
instead of ARRAY_REF.
(cp_parser_decomposition_declaration): Use sb-identifier-list instead
of identifier-list in comments. Parse structured bindings with
structured binding pack. Don't emit pedwarn about structured
binding attributes in structured bindings inside of a condition.
(cp_convert_omp_range_for): Also handle TREE_VEC as DECL_VALUE_EXPR
instead of ARRAY_REF.
* decl.cc (get_tuple_element_type): Change i argument type from
unsigned to unsigned HOST_WIDE_INT.
(get_tuple_decomp_init): Likewise.
(set_sb_pack_name): New function.
(cp_finish_decomp): Handle structured binding packs.
* pt.cc (tsubst_pack_expansion): Handle structured binding packs
and capture proxies for them. Formatting fixes.
(tsubst_decl): For structured binding packs don't tsubst TREE_TYPE
first, instead recreate the type after r is created.
(tsubst_omp_for_iterator): Also handle TREE_VEC as DECL_VALUE_EXPR
instead of ARRAY_REF.
(tsubst_expr): Handle sizeof... on non-dependent structure binding
packs.
(value_dependent_expression_p): Return false for sizeof... on
non-dependent structure binding packs.
(instantiation_dependent_r): Don't recurse on sizeof... on
non-dependent structure binding packs.
* constexpr.cc (potential_constant_expression_1): Also handle
TREE_VEC on DECL_VALUE_EXPR of structure binding packs.
gcc/testsuite/
* g++.dg/cpp26/decomp13.C: New test.
* g++.dg/cpp26/decomp14.C: New test.
* g++.dg/cpp26/decomp15.C: New test.
* g++.dg/cpp26/decomp16.C: New test.
* g++.dg/cpp26/decomp17.C: New test.
* g++.dg/cpp26/decomp18.C: New test.
* g++.dg/cpp26/decomp19.C: New test.
* g++.dg/cpp26/decomp20.C: New test.
* g++.dg/cpp26/decomp21.C: New test.
* g++.dg/cpp26/feat-cxx26.C (__cpp_structured_bindings): Expect
202411 rather than 202403.
aarch64: Mark SME functions as .variant_pcs [PR121414]
Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible
functions allow/require PSTATE.SM to be 1 on entry, so they need to
be treated as STO_AARCH64_VARIANT_PCS.
Similarly, functions that share ZA or ZT0 with their callers require
ZA to be active on entry, whereas the base PCS requires ZA to be
dormant or off. These functions too need to be marked as having
a variant PCS.
gcc/
PR target/121414
* config/aarch64/aarch64.cc (aarch64_is_variant_pcs): New function,
split out from...
(aarch64_asm_output_variant_pcs): ...here. Handle various types
of SME function type.
gcc/testsuite/
PR target/121414
* gcc.target/aarch64/sme/pr121414_1.c: New test.
Remove MODE_COMPOSITE_P test from simplify_gen_subreg [PR120718]
simplify_gen_subreg rejected subregs of literal constants if
MODE_COMPOSITE_P. This was added by the fix for PR96648 in
g:c0f772894b6b3cd8ed5c5dd09d0c7917f51cf70f. Jakub said:
As for the simplify_gen_subreg change, I think it would be desirable
to just avoid creating SUBREGs of constants on all targets and for all
constants, if simplify_immed_subreg simplified, fine, otherwise punt,
but as we are late in GCC11 development, the patch instead guards this
behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to
powerpc{,64,64le} double double long double - and only for the cases where
simplify_immed_subreg was called.
I'm not sure about relaxing the codes further, since subregs might
be wanted for CONST, SYMBOL_REF and LABEL_REF. But removing the
MODE_COMPOSITE_P is needed to fix PR120718, where we get an ICE
from generating a subreg of a V2SI const_vector.
Richard Biener [Wed, 6 Aug 2025 10:31:13 +0000 (12:31 +0200)]
tree-optimization/121405 - missed VN with aggregate copy
The following handles value-numbering of a BIT_FIELD_REF of
a register that's defined by a load by looking up a subset
load similar to how we handle bit-and masked loads. This
allows the testcase to be simplified by two FRE passes,
the first one will create the BIT_FIELD_REF.
PR tree-optimization/121405
* tree-ssa-sccvn.cc (visit_nary_op): Handle BIT_FIELD_REF
with reference def by looking up a combination of both.
* gcc.dg/tree-ssa/ssa-fre-107.c: New testcase.
* gcc.target/i386/pr90579.c: Adjust.
Pengfei Li [Thu, 7 Aug 2025 11:08:35 +0000 (11:08 +0000)]
vect: Extend peeling and versioning for alignment to VLA modes
This patch extends the support for peeling and versioning for alignment
from VLS modes to VLA modes. The key change is allowing the DR target
alignment to be set to a non-constant poly_int. Since the value must be
a power-of-two, for variable VFs, the power-of-two check is deferred to
runtime through loop versioning. The vectorizable check for speculative
loads is also refactored in this patch to handle both constant and
variable target alignment values.
Additional changes for VLA modes include:
1) Peeling
In VLA modes, we use peeling with masking - using a partial vector in
the first iteration of the vectorized loop to ensure aligned DRs in
subsequent iterations. It was already enabled for VLS modes to avoid
scalar peeling. This patch reuses most of the existing logic and just
fixes a small issue of incorrect IV offset in VLA code path. This also
removes a power-of-two rounding when computing the number of iterations
to peel, as power-of-two VF has been guaranteed by a new runtime check.
2) Versioning
The type of the mask for runtime alignment check is updated to poly_int
to support variable VFs. After this change, both standalone versioning
and peeling with versioning are available in VLA modes. This patch also
introduces another runtime check for speculative read amount, to ensure
that all speculative loads remain within current valid memory page. We
plan to remove these runtime checks in the future by introducing capped
VF - using partial vectors to limit the actual VF value at runtime.
3) Speculative read flag
DRs whose scalar accesses are known to be in-bounds will be considered
unaligned unsupported with a variable target alignment. But in fact,
speculative reads can be naturally avoided for in-bounds DRs as long as
partial vectors are used. Therefore, this patch clears the speculative
flags and sets the "must use partial vectors" flag for these cases.
This patch is bootstrapped and regression-tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu with bootstrap-O3.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Allow DR target alignment to be a poly_int.
(vect_enhance_data_refs_alignment): Support peeling and
versioning for VLA modes.
* tree-vect-loop-manip.cc (get_misalign_in_elems): Remove
power-of-two rounding in peeling.
(vect_create_cond_for_align_checks): Update alignment check
logic for poly_int mask.
(vect_create_cond_for_vla_spec_read): New runtime checks.
(vect_loop_versioning): Support new runtime checks.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new
loop_vinfo field.
(vectorizable_induction): Fix wrong IV offset issue.
* tree-vect-stmts.cc (get_load_store_type): Refactor
vectorizable checks for speculative loads.
* tree-vectorizer.h (LOOP_VINFO_MAX_SPEC_READ_AMOUNT): New
macro for new runtime checks.
(LOOP_REQUIRES_VERSIONING_FOR_SPEC_READ): Likewise
(LOOP_REQUIRES_VERSIONING): Update macro for new runtime checks.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/peel_ind_11.c: New test.
* gcc.target/aarch64/sve/peel_ind_11_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_12.c: New test.
* gcc.target/aarch64/sve/peel_ind_12_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_13.c: New test.
* gcc.target/aarch64/sve/peel_ind_13_run.c: New test.
Download newer versions of GMP, MPFR and MPC (the latest); besides the usual
bug fixes and smaller features, MPFR adds new functions for C23, some of
which are already used in GCC in the middle (fold-const-call.cc) and in
Fortran 2023 for the 'pi' trignonometric functions, if MPFR is new enough.
Jakub Jelinek [Thu, 7 Aug 2025 06:53:23 +0000 (08:53 +0200)]
c++: Add testcase for CWG2577 [PR120778]
And here is the last part of the paper. Contrary what the paper claims
(clearly they haven't tried -pedantic nor -pedantic-errors), I think we
already diagnose everything we should.
2025-08-07 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/120778
* g++.dg/DRs/dr2577-1.C: New test.
* g++.dg/DRs/dr2577-2.C: New test.
* g++.dg/DRs/dr2577-2.h: New file.
* g++.dg/DRs/dr2577-3.C: New test.
* g++.dg/DRs/dr2577-3.h: New file.
Jakub Jelinek [Thu, 7 Aug 2025 06:52:38 +0000 (08:52 +0200)]
c++: Add testcase for CWG2575 [PR120778]
From the paper it isn't clear what caused the decision changes, not to drop
the "the token defined is generated as a result of this replacement process or"
part and make [cpp.cond]/10 violations IFNDR rather than ill-formed (the
latter maybe so that the extension to handle e.g. !A(A) below etc. can be
accepted).
Anyway, because that case hasn't been dropped and we pedwarn on it already,
and diagnose everything else the way it should, the following patch just
adds testcase for it.
2025-08-07 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/120778
* g++.dg/DRs/dr2575.C: New test.