We can reassociate the operations when the XOR only flips bits resulting from
the right or left shift, but not both. So after reassociation in gimple we
get:
> _1 = a_2(D) r>> 1;
> _3 = _1 ^ 1;
Which results in:
> rori a0,a0,1
> xori a0,a0,1
We don't bother with the transformation when the XOR is flipping a bit known to
be zero (ie, a high bit of the result of the right shift or a low bit on the
result of the left shift). For those cases we already figure out that the XOR
is just an IOR and the right things already "just happen".
This triggered some code generation changes on the SH (not surprising because
this BZ was derived from an older SH BZ). It doesn't seem to significantly
improve the SH code, though it does turn a cmp/pz + rotate through carry with a
rotate + xor with immediate. That may be a latency win on the SH, I really
don't know.
Shreya did the bulk of the work here. My contribution was the sister pattern
which has the XOR on the other operand and testcase development.
Bootstrapped and regression tested on x86 & riscv. Also tested across the
various embedded targets without any regressions.
PR target/121778
gcc/
* match.pd: Add pattern to recognize rotate with one or more
bits flipped via xor.
* config/sh/sh.md (*rotcl); New variant which handles the output
we get after the match.pd change above.
gcc/testsuite/
* gcc.target/riscv/pr121778.c: New test.
Co-Authored-By: Jeff Law <jeffrey.law@oss.qualcomm.com>
Jeff Law [Fri, 9 Jan 2026 04:14:18 +0000 (21:14 -0700)]
[RISC-V] Clamp long reservations to 7c
So I've been noticing the cycle time for a native build/test on the
Pioneer and BPI rising over the last many months. I've suspected a pain
point is likely genautomata due to long reservations in the DFAs.
Trying to describe a 30+ cycle bubble in the pipeline just isn't useful
and causes the DFA to blow up.
This is time to build insn-automata.cc using an optimized genautomata
using my skylake server cross compiling to riscv64. The baseline is what
we have today. Then I clamped the reservations (but not the latency) to
7c. 7c is arbitrary, but known not to blow up the DFA. I fixed the BPI
first, then the Andes 23 and so-on.
That's a significant improvement, though I probably wouldn't go forward
with just that improvement. It's less than a minute and skylake systems
aren't exactly new anymore...
Let's try that with an unoptimized genautomata. I often build that way
when debugging.
Baseline 343s
Final 79s
So that's saving ~4m on my skylake server for a common build. Given I
use ccache, that 4m is often a significant amount of the build time. So
this feels like a better motivating example.
But I'm really after bringing down bootstrap cycle times on the BPI and
Pioneer. So let's see what the BPI does. For an optimized genautomata
we get (not testing all the intermediate steps):
Baseline 310s
Final: 110s
Not bad. And if we look at unoptimized genautomata:
Baseline: 2196s
Final: 553s
Now we can see why bootstrap times have crept up meaningfully. That's
~27 minutes out of a 9hr bootstrap time on the BPI (pure bootstrap, no
testing). The effect is more pronounced on the Pioneer where the
improvement is 30+ minutes on a 4hr bootstrap time (each core is slower,
but there's 8x as many cores).
Tested on riscv{32,64}-elf and bootstrapped on the Pioneer (regression
testing in progress). I'll wait for pre-commit CI to do its thing.
The bug is a stale Virtual SSA VDEF on calls to functions that have
been marked const or pure.
pure_const pass analyzes function rocksdb::y::y() and determines it has no side
effects and marks it as const.
At this point, existing call sites to y::y() in other functions still have:
# .MEM_12 = VDEF <.MEM_11> rocksdb::y::y (&l, _9);
The VDEF indicates the call modifies memory but now that y::y() is const,
this VDEF is stale
Later passes after feedback_fnsplit SSA verification fails. Added fixup.
Jakub Jelinek [Thu, 8 Jan 2026 22:05:58 +0000 (23:05 +0100)]
stmt: Fix up parse_input_constraint [PR111817]
The following invalid testcase ICEs, because we:
1) for some strange reason ignore invalid punctuations in
parse_output_constraint, which has just
default:
if (!ISALPHA (*p))
break;
compared to parse_input_constraint
default:
if (! ISALPHA (constraint[j]))
{
error ("invalid punctuation %qc in constraint", constraint[j]);
return false;
}
Haven't touched this because I fear it could break real-world code
2) the checking whether = or + is first in the output constraint is
a warning only:
if (p != constraint)
warning (0, "output constraint %qc for operand %d "
"is not at the beginning",
*p, operand_num);
3) parse_input_constraint parses also the corresponding output constraint
if the input constraint has a number as the only variant, but
even the comment removed in the following patch explains that it
doesn't work correctly and skips the first character; now, usually
that is not a big deal because if the first character of the output
constraint is = or + as it should, then the checking doesn't do anything;
but as 2) is just a warning, we accept it and then we fail to check it
4) far later on we parse the whole output constraint when input constraint
refers to it and assert it succeeds, which it doesn't due to 1), 2) and 3)
The following patch fixes the 3) spot, when switching to the output
constraint, instead of setting j = 0; and break; (== continue;) so that it
first does j += CONSTRAINT_LEN (constraint[0], constraint+0) and thus
usually starts at second, sometimes third character of the output constraint
it uses goto before the loop which sets j = 0; and doesn't do the j += ...
2026-01-08 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111817
* stmt.cc (parse_input_constraint): For matching construct, goto
before the loop without changing j instead of break. Remove comment
about that problem.
Robin Dapp [Wed, 10 Dec 2025 18:02:11 +0000 (19:02 +0100)]
RISC-V: -mrvv-max-lmul=conv-dynamic [PR122846].
As discussed in the patchwork sync this patch adds a dynamic LMUL mode
that sets the LMUL to the ratio of largest/smallest type size in a loop,
with the maximum being LMUL8.
This is supposed to imitate what other architectures implicitly do by
vec_unpack_hi/lo. I have done cursory testing and obviously more
coverage would be preferred.
PR target/122846
gcc/ChangeLog:
* config/riscv/riscv-opts.h (enum rvv_max_lmul_enum): Add
RVV_CONV_DYNAMIC.
(TARGET_MAX_LMUL): Ditto.
* config/riscv/riscv-string.cc (use_vector_stringop_p): Use
LMUL1 for RVV_CONV_DYNAMIC.
(expand_rawmemchr): Ditto.
(expand_strcmp): Ditto.
(check_vectorise_memory_operation): Ditto.
* config/riscv/riscv-vector-costs.cc (get_smallest_mode):
New function.
(compute_lmul_from_conversion_ratio): Calculate LMUL from
largest/smallest type.
(costs::has_unexpected_spills_p): Split.
(costs::compute_live_ranges_and_lmul): Compute smallest type and
call new function.
(costs::cleanup_live_range_data): New function.
(costs::compute_conversion_dynamic_lmul): New function.
(costs::record_potential_unexpected_spills): Use new function.
(costs::better_main_loop_than_p): Allow appropriate LMUL.
* config/riscv/riscv-vector-costs.h: Declare.
* config/riscv/riscv.opt: New option
-mrvv-max-lmul=conv-dynamic.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/dyn-lmul-conv-1.c: New test.
* gcc.target/riscv/rvv/autovec/dyn-lmul-conv-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr122846.c: New test.
Thomas Koenig [Thu, 8 Jan 2026 20:35:27 +0000 (21:35 +0100)]
Fix compile failure on systems not supporting gthreads.
I just realized that it is possible to run a check by #undef - ing
__GTHREADS_CXX0X in async.h. Doing so promptly found another syntax
error, which this version of the patch fixes.
PR libfortran/123446
PR libfortran/119136
libgfortran/ChangeLog:
* io/async.h: DEBUG_ASYNC needs gtreads support.
(LOCK_UNIT): Only lock when there is pthreads support and it is active.
Otherwise, just set unit->self to 1.
(UNLOCK_UNIT): Only unlock when there is pthreads support and it is active.
Otherwise, just set unit->self to 0.
(TRYLOCK_UNIT): Only try locking when thee is pthreads support and it is
active. Otherwise, return unit->self.
(OWN_THREAD_ID): New macro.
* io/io.h: gfc_unit's self is an int when there is no gthreads support.
* io/unit.c (check_for_recursive): Check for equality of unit which
locked to OWN_THREAD_ID.
Olivier Hainque [Thu, 8 Jan 2026 17:17:30 +0000 (14:17 -0300)]
Introduce x86_64-linux-gnuabi32
Enable a 32-bit "native" toolchain on x86_64-*-linux-gnu and
x86_64-*-mingw32, i.e., one that targets -m32 by default, despite
supporting and potentially running in 64-bit mode, by appending abi32
to the triplet, and/or by setting the default ABI to 32 or m32.
Adjust libada and gnattools build machinery to support this
configuration in both bootstrap and non-bootstrap modes.
Co-Authored-By: Alexandre Oliva <oliva@adacore.com>
for gcc/ChangeLog
* config.gcc [x86_64-*-*]: Match *abi32 target, default to m32
abi. Accept 32 or m32 for --with-abi.
for gcc/ada/ChangeLog
* gcc-interface/Make-lang.in (m32_target): Set.
(not_m32_target, native_target, native_gnattools1): Set.
(ADA_TOOLS_FLAGS_TO_PASS, gnattools): Handle x86_64 natives
defaulting to -m32 as cross for gnattools.
* gcc-interface/Makefile.in (target_cpu): Set to i686 for
x86_64 configurations defaulting to -m32.
for gnattools/ChangeLog
* configure.ac (default_gnattools_target): Use gnattools-cross
when not bootstrapping x86_64 configurations defaulting to
-m32.
* configure: Rebuild.
This patch introduces a new configure-time option --with-multi-buildlist
to allow fine-grained control over which multilib variants are built.
The option accepts a path to a file containing a list of multilib
directories to be included in the build. Each line in the file should
contain a single multilib directory name, matching those generated by
the compiler's --print-multi-lib output.
This mechanism is target-independent and enables users to reduce build
time and binary size by excluding unnecessary multilib variants. It is
especially useful for embedded targets with constrained environments or
vendor-specific requirements.
The option is propagated to both host and target configuration stages,
and used in config-ml.in and gcc/Makefile.in to filter the multilib
list.
Documentation for this feature is added to gcc/doc/install.texi.
/ChangeLog
* config-ml.in: Use with_multi_buildlist to build multidirs.
Skip configuration for subdir returned by
--print-multi-directory.
* configure: Regenerate.
* configure.ac: Source target-specific configuration fragment
for GCC. Pass through with_multi_buildlist to host and target.
gcc/ChangeLog
* Makefile.in: Add with_multi_buildlist for multilib
configuration control. Pass an additional argument to
genmultilib indicating whether --with-multi-buildlist is set
(true or false). Use with_multi_buildlist to filter
multilib directories in fixinc_list.
* configure: Regenerate.
* configure.ac: Restrict the installed fixedincludes multilibs.
* configure.tgt: New file.
* doc/install.texi: Add --with-multi-buildlist configure option
for multilib filtering.
* genmultilib: Document the new eleventh argument indicating
whether --with-multi-buildlist configure option is set (true or
false). Update argument parsing to include this flag before
enable_multilib. Modify reuse rule validation:
- Keep the original error for reuse of nonexistent multilibs
when --with-multi-buildlist is not used.
- Suppress the error only when the new configure option is
active, allowing reuse rules to reference multilibs that are
intentionally excluded from the build.
Signed-off-by: Robert Suchanek <robert.suchanek@imgtec.com> Signed-off-by: Chao-ying Fu <cfu@mips.com> Signed-off-by: Aleksandar Rakic <aleksandar.rakic@htecgroup.com>
Tomas Glozar [Thu, 8 Jan 2026 15:42:01 +0000 (08:42 -0700)]
[PATCH 2/2] ia64: Expand MAX_VECT_LEN to 16
MAX_VECT_LEN is set to 8 on ia64, which is lower than on all other
targets, where it is 16 at minimum.
Some of the machine modes of ia64 are internally 16-byte wide,
causing stringop-overflow to be unhappy when checking a loop in
ia64_vectorize_vec_perm_const(). This causes bootstrap to fail.
Make stringop-overflow happy by raising the length to 16.
gcc/ChangeLog:
* config/ia64/ia64.cc (MAX_VECT_LEN): Set to 16 from 8.
The problem seems to be with a packing permutation:
op0[4] op0[5] op0[6] op0[7]
and with the identity_offset parameter to vect_add_slp_permutation.
Both the repeating_p and !repeating_p paths correctly realise that this
permutation reduces to an identity. But the !repeating_p path ends up with
first_node and second_node both set to the second VEC_PERM_EXPR operand
(since that path works elementwise, and since no elements are taken from
the first input). Therefore, the call:
works regardless of whether vect_add_slp_permutation picks first_def or
second_def. In that sense, the parameters to vect_add_slp_permutation are
already “canonical”.
The repeating_p path instead passes vector 2N as first_def and vector 2N+1
as second_def, with mask[0] indicating the position of the identity within
the concatenation of first_def and second_def. However,
vect_add_slp_permutation doesn't expect this and instead ignores the
identity_offset parameter.
PR tree-optimization/122793
* tree-vect-slp.cc (vect_add_slp_permutation): Document the existing
identity_offset parameter. Handle identities that take from the
second input rather than the first.
* gcc.dg/vect/vect-pr122793.c: New testcase.
Co-authored-by: Richard Biener <rguenther@suse.de>
Pietro Monteiro [Thu, 8 Jan 2026 12:31:40 +0000 (07:31 -0500)]
Containerfile for base forge actions
Build autoconf and automake and add autoregen.py from
https://sourceware.org/git/builder.git
Add forge action to build container images.
ChangeLog:
* .forgejo/workflows/build-containers.yaml: New file.
contrib/ChangeLog:
* ci-containers/README: New file.
* ci-containers/autoregen/Containerfile: New file.
* ci-containers/autoregen/autoregen.py: New file.
* ci-containers/build-image.sh: New file.
Signed-off-by: Pietro Monteiro <pietro@sociotechnical.xyz>
Richard Biener [Thu, 8 Jan 2026 09:10:25 +0000 (10:10 +0100)]
tree-optimization/123298 - fix backedge detection for VN alias walk
When trying to skip a virtual PHI during an alias walk we have to
direct a possible VN translation hook to not use valueization when
walking a backedge. But this backedge detection was overly
optimistic, not honoring irreducible regions. The following hookizes
the backedge detection so VN can properly flag edges that are back
with respect to its particular CFG traversal.
PR tree-optimization/123298
* tree-ssa-alias.h (get_continuation_for_phi): Take a gphi *,
add is_backedge hook argument.
(walk_non_aliased_vuses): Add is_backedge hook argument.
* tree-ssa-alias.cc (maybe_skip_until): Adjust.
(get_continuation_for_phi): Use new hook to classify an
edge into the PHI as backedge.
(walk_non_aliased_vuses): Adjust.
* gimple-lower-bitint.cc (bitint_dom_walker::before_dom_children):
Likewise.
* ipa-prop.cc (determine_known_aggregate_parts): Likewise.
* tree-ssa-scopedtables.cc (avail_exprs_stack::lookup_avail_expr):
Likewise.
* tree-ssa-pre.cc (translate_vuse_through_block): Likewise.
* tree-ssa-sccvn.cc (vn_bb_to_rpo): Make BB to RPO order
mapping accessible from new hook.
(do_rpo_vn_1): Likewise.
(vn_is_backedge): New hook to classify edge.
(vn_reference_lookup_pieces): Adjust.
(vn_reference_lookup): Likewise.
Richard Biener [Thu, 8 Jan 2026 08:32:19 +0000 (09:32 +0100)]
More verbose dumping on missed vector optabs
The following changes 'no optab' to mention which tree code and
vector type we were looking for and adds 'shift' to the instances
of this message emitted from vectorizable_shift.
* tree-vect-stmts.cc (vectorizable_shift): Improve missing
optab or optab support messages.
(vectorizable_operation): Likewise.
s390: Remove volatile check from constraints A[QRST]
Logical operations like *x &= -10 may be folded to a single
storage-and-immediate instruction NI which accesses only the least
significant byte of *x.
Similarly but still distinct operations like *x &= *y may be implemented
via storage-and-storage instruction NC which loads and stores one byte
after another of operands.
Since volatile objects must be accessed by a single load/store of the
entire object, those optimizations must be rejected in case of volatile
memory operands. An exception to this are 16-byte load/stores which are
implemented by two operations (in case of non-atomic operands).
Previously, multi-letter constraints A[QRST] were intended to reject
volatile memory operands. However, during LRA, if a memory constraint
is not satisfiable, as a last resort, LRA tries reloading the address.
This, of course, doesn't fix the issue and during checking we finally
bail out in case of a winning alternative.
Fixed by enforcing non-volatile memory operands via conditions of
instruction patterns which is done in s390_logical_operator_ok_p() for
all AND/IOR/XOR instructions by this patch. By removing the volatile
check in constraints A[QRST] this fixes tests
gcc.dg/torture/float128-basic.c, float64x-basic.c,
fp-int-convert-float128-ieee.c, fp-int-convert-float64x.c,
fp-int-convert-long-double.c which started failing after r16-5947.
gcc/ChangeLog:
* config/s390/s390.cc (s390_logical_operator_ok_p): Test for
volatile memory.
(s390_mem_constraint): Remove volatile condition.
* config/s390/s390.md (*andc_split_<mode>): Test for volatile
memory.
gcc/testsuite/ChangeLog:
* gcc.target/s390/narrow-logical-op-1.c: New test.
Jakub Jelinek [Thu, 8 Jan 2026 09:37:20 +0000 (10:37 +0100)]
testsuite: Fix up pr123319.c [PR123319]
The testcase committed as gcc.dg/pr123319.c was x86_64 specific due to
immintrin.h include and use of _mm_avg_pu8 & __m64. Furthermore, it
failed even on ia32 due to using SSE ISA stuff without -msse.
The following patch fixes that by moving that test to gcc.target/i386/,
adding -msse, adding comment with PR number, adding -msse to dg-options
and adding a new generic test written by Andrew Pinski as gcc.dg/pr123319.c.
Tested on x86_64-linux with
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} dg.exp=pr123319.c i386.exp=pr123319.c'
both with current cc1 and cc1 from 2 days ago where everything ICEd.
2026-01-08 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <andrew.pinski@oss.qualcomm.com>
PR tree-optimization/123319
* gcc.dg/pr123319.c: Replace test with target independent one. Move
previous test to ...
* gcc.target/i386/pr123319.c: ... here. Add comment with PR number,
add -msse to dg-options, move immintrin.h include right after stdint.h
include.
Tomasz Kamiński [Wed, 7 Jan 2026 16:24:55 +0000 (17:24 +0100)]
libstdc++: Use tree-dump-gimple in variant constant init tests.
Use of scan-assembler-dem for matching against debug symbols, turned out to not
be portable, as they representation in assembly output differs between platforms:
arm use 60 columns limit, emitting multiple rows, and some platforms may encode
them using base64.
We use tree-dump-gimple instead, that outputs a constructor name portably,
allowing us to simply match for invocation of constructor for given type, as each
variable has different type.
To use scan-tree-dump(-not) we load scantree.exp file and it's dependency
scandump.exp from gcc/testsuite/lib.
libstdc++-v3/ChangeLog:
* testsuite/20_util/variant/constinit.cc: Use scan-tree-dump
for matching of constructor.
* testsuite/20_util/variant/constinit_compat.cc: Likewise.
* testsuite/lib/libstdc++.exp: Load scantree.exp and scandump.exp.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Richard Biener [Wed, 7 Jan 2026 12:18:42 +0000 (13:18 +0100)]
middle-end/123107 - avoid invalid vector folding
We fold (v >> CST) == { 0, 0.. } into v < { 0, 0.. } but fail to
validate that's valid for the target. The following adds such check,
making sure to apply after IPA (due to offloading) and only when
the original form wasn't valid for the target (like before vector
lowering) or when the new form is. In particular in this case
we have an equality compare resulting in a non-vector which we
can handle, but a similar LT/GT is never handled.
PR middle-end/123107
* fold-const.cc (fold_binary_loc): Guard (v >> CST) == { 0, 0.. }
to v < { 0, 0.. } folding.
Andrew Pinski [Sat, 3 Jan 2026 19:32:02 +0000 (11:32 -0800)]
vect/ifcvt: Don't factor out VEC_PERM_EXPR with constant masks [PR123382]
VEC_PERM_EXPR is another special case expression where constants can mean
something different from non-constant.
So if we have:
```
if (_5 != 0) goto <bb 4>; else goto <bb 5>;
<bb 4>
t_15 = VEC_PERM_EXPR <t_12, t_12, { 3, 3, 2, 3 }>;
goto <bb 6>; [100.00%]
<bb 5>
t_14 = VEC_PERM_EXPR <t_12, t_12, { 0, 0, 2, 3 }>;
<bb 6>
# t_7 = PHI <t_15(4), t_14(5)>
```
We can't factor out the VEC_PERM_EXPR here since the type
of the vector constant can be different from the type of
the other operands. This is unlike the operand is not a
constant, the mask has to be an integral type which is
similar to the other operands.
Changes since v1:
* v2: Expand comment on why we should reject this.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/123382
gcc/ChangeLog:
* tree-if-conv.cc: Reject VEC_PERM_EXPR for factoring
if it is the mask and they are constant.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr123382-1.c: New test.
* gcc.dg/torture/pr123382-2.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Paul Thomas [Wed, 7 Jan 2026 16:14:12 +0000 (16:14 +0000)]
Fortran: [PDT]Fix ICE in tree check and memory leaks[PR90218, PR123071]
2026-01-07 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/123071
* resolve.cc (resolve_typebound_function): If a generic
typebound procedure is marked as overridable and all the
specific procedures are non-overridable, it is safe to resolve
the compcall.
PR fortran/90218
* trans-array.cc (gfc_trans_array_constructor_value): PDT
structure constructor elements must be finalized.
(trans_array_constructor): Set 'finalize_required' for PDT
constructors.
* trans-decl.cc (gfc_get_symbol_decl): PDT initialization is
required in contained namespaces as long as the parent is not
a module.
(gfc_init_default_pdt): Delete the stmtblock_t argument. Assign
a variable 'value' expression using gfc_trans_assignment.
Simplifiy the logic around the call to gfc_init_default_dt. In
both cases return a tree expression or null tree.
(gfc_trans_deferred_vars): Only call gfc_allocate_pdt_comp if
gfc_init_default_pdt returns null tree.
* trans-expr.cc (gfc_trans_alloc_subarray_assign): Add a static
stmtblock_t pointer 'final_block'. Free 'dest' data pointer and
add to final_block.
(gfc_conv_structure): Set 'final_block' to the se's finalblock.
(gfc_trans_assignment_1): Do not deallocate PDT array ctrs.
trans-stmt.cc (gfc_trans_allocate): Also deallocate PDT expr3
allocatable components.
(gfc_trans_deallocate): Add PDT deallocation to se.pre instead
of block.
* trans-stmt.cc (gfc_trans_allocate): Free the allocatable
components of a PDT expr3.
(gfc_trans_deallocate): Add 'tmp' to se.pre rather than block.
gcc/testsuite/
PR fortran/90218
* gfortran.dg/pdt_79.f03: Used uninitialized warning and change
tree scan for 'mapped_tensor.j' to 'Pdttensor_t_4.2.j'.
* gfortran.dg/pdt_80.f03: New test.
Tomas Glozar [Wed, 7 Jan 2026 16:02:15 +0000 (09:02 -0700)]
[PATCH 1/2] ia64: Fix zero_call_used_regs for PRs [PR121535]
ia64 uses default_zero_call_used_regs(), which uses emit_move_insn()
to zero out registers. ia64 predicate registers use BImode, which is not
supported by emit_move_insn().
Implement ia64_zero_call_used_regs() to zero PRs by manually emitting
a CCImode move. default_zero_call_used_regs() is then called to handle
the remaining registers.
PR target/121535
gcc/ChangeLog:
* config/ia64/ia64.cc (TARGET_ZERO_CALL_USED_REGS): Override
function with target-specific one.
(struct gcc_target): Move to end of file.
(ia64_zero_call_used_regs): Add target-specific function.
Xinhui Yang [Wed, 7 Jan 2026 15:59:01 +0000 (08:59 -0700)]
[PATCH] ia64: properly include libunwind support during configuration
By using the test `with_system_libunwind', libgcc can use either
in-house implementation or reference external libunwind symbols.
However, this breaks the static libgcc.a library, as in t-linux it
references unwind-compat.c, which turns some _Unwind_* symbols into
references of the corresponding symbols in libunwind, but libunwind does
not exist in some conditions (e.g. bootstrapping a toolchain). The
linker complains about `missing version node for symbol', since it can
not find the symbol it is referring to.
The unwind-compat.c module should only exist, if system libunwind is
being used. Also GCC itself should add -lunwind only if this condition
is met, too.
Implementing better control for whether to embed unwind implementation
into libgcc to fix this issue.
gcc/
* config.gcc: limit -lunwind usage by testing if the system
libunwind is being used.
libgcc/
* config.host (ia64): include unwind-compat only if the system
libunwind is being used.
* config/ia64/t-linux-libunwind: include libgcc symver definition
for libgcc symbols, since it bears the same role as t-linux
(except libunwind); Include fde-glibc.c since the unwind
implementation requires _Unwind_FindTableEntry in this file.
* config/ia64/unwind-ia64.c: protect _Unwind_FindTableEntry inside
inihbit_libc ifndefs to allow it to build with newlib or
without proper headers.
Xi Ruoyao [Wed, 31 Dec 2025 01:52:35 +0000 (09:52 +0800)]
LoongArch: guard SImode simple shift and arithmetic expansions with can_create_pseudo_p [PR 123320]
As we have hardware instructions for those operations, developers will
reasonably assume they can emit them even after reload. But on LA64 we
are expanding them using pseudos to reduce unneeded sign extensions,
breaking such an expectation and causing ICE like PR 123320.
Only create the pseudo when can_create_pseudo_p () to fix such cases.
PR target/123320
gcc
* config/loongarch/loongarch.md (<optab><mode>3): Only expand
using psuedos when can_create_pseudo_p ().
(addsi3): Likewise.
[committed] [PR target/123403] Fix base register and offsets for v850 libgcc
PR target/123403
libgcc/
* config/v850/lib1funcs.S (__return_r25_r29): Fix ! __EP__ clause to
use SP, not EP.
(__return_r2_r31): Fix offsets to match store offsets.
When basic_stringbuf::setbuf has been called we need to copy the
contents of the buffer into _M_string first, before returning that.
libstdc++-v3/ChangeLog:
PR libstdc++/123100
* include/std/sstream (basic_stringbuf::str()&&): Handle the
case where _M_string is not being used for the buffer.
* testsuite/27_io/basic_stringbuf/str/char/123100.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Tue, 6 Jan 2026 14:00:09 +0000 (14:00 +0000)]
libstdc++: Override detection of flockfile support in newlib [PR123406]
As explained in the PR, flockfile and funlockfile are always declared by
newlib and there's no easy way to detect whether they're actually
defined. Ensure that ac_stdio_locking=no gets set for non-cygwin newlib
targets.
libstdc++-v3/ChangeLog:
PR libstdc++/123406
* acinclude.m4 (GLIBCXX_CHECK_STDIO_LOCKING): Override detection
of flockfile for non-cygwin newlib targets.
* configure: Regenerate.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Mon, 5 Jan 2026 17:29:40 +0000 (17:29 +0000)]
libstdc++: Fix memory leak in std::barrier destructor [PR123378]
When I replaced the std::unique_ptr member in r16-997-gef632273a90657 I
should have added an explicit delete[] operation to replace the effects
of the unique_ptr destructor.
Tobias Burnus [Wed, 7 Jan 2026 14:51:55 +0000 (15:51 +0100)]
OpenMP: Add early C/C++ parser support for 'groupprivate' directive
After parsing the directive, 'sorry, unimplemented' is printed.
Note that restriction checks still have to be implemented, but this
depends on parser support for the 'local' clause of 'omp declare target',
which still has to be implemented.
Andrew MacLeod [Tue, 6 Jan 2026 15:14:47 +0000 (10:14 -0500)]
Early builtin_unreachable removal must examine dependencies.
Even if all uses of a name are dominated by the unreachable branch,
recomputation of a value in the defintion of a name might be reachable.
PR tree-optimization/123300
gcc/
* gimple-range-gori.cc (gori_map::exports_and_deps): New.
* gimple-range-gori.h (exports_and_deps): New prototype.
(FOR_EACH_GORI_EXPORT_AND_DEP_NAME): New macro.
* tree-vrp.cc (remove_unreachable:remove_unreachable): Initialize
m_tmp bitmap.
(remove_unreachable:~remove_unreachable): Dispose of m_tmp bitmap.
(remove_unreachable:fully_replaceable): Move from static function
and check reachability of exports and dependencies.
The initial idea of this optimization was to reduce it to "X != 0",
checking for either X being an unsigned or a truncating conversion.
Then we discussed reducing it to "(X & -X) != 0" instead. This form
would avoid the potential trapping problems (like -ftrapv) that might
happen in case X is not an unsigned type.
Then, as suggested by Roger Sayle in bugzilla, we could reduce to just
"-X != 0". Keeping the negated value in the pattern preserves any trapping
or UBs to be handled by other match.pd patterns that are more able to do
the conversion to "X != 0" when applicable. This would also spare us from
a TYPE_UNSIGNED check.
Jakub Jelinek [Wed, 7 Jan 2026 14:17:21 +0000 (15:17 +0100)]
combine: Fix up serious regression in try_combine [PR121773]
Back in April last year I've changed try_combine's condition when trying to
split two independent sets by moving one of them to i2. Previously this was
testing !modified_between_p (SET_DEST (setN), i2, i3) and I've changed it
to SET_DEST (setN) != pc_rtx && !reg_used_between_p (SET_DEST (set1), i2, i3)
on the assumption written in the r15-9131-g19ba913517b5e2a00 commit
message:
"The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both."
That assumption is wrong though, neither of these is a subset of the
other and I don't see any APIs which test both. We need to avoid moving
a set from i3 to i2 both in case where the REG (or SUBREG_REG of SUBREG or
MEM or whatever else) is set/modified between i2 and i3 exclusive, as shown
by the testcase in PR121773 (which I'm not including because my ARM neon
knowledge is limited). We have i2 insn 18 and i3 insn 7 after the current
try_combine modifications:
(insn 18 5 19 2 (set (reg:SI 104 [ _6 ])
(const_int 305419896 [0x12345678])) "include/arm_neon.h":7467:22 542 {*arm_movsi_vfp}
(expr_list:REG_EQUAL (const_int 305419896 [0x12345678])
(nil)))
(insn 19 18 21 2 (set (reg:SI 105 [ _6+4 ])
(const_int 538968064 [0x20200000])) "include/arm_neon.h":7467:22 542 {*arm_movsi_vfp}
(nil))
(insn 21 19 7 2 (set (reg:DI 101 [ _5 ])
(const_int 0 [0])) "include/arm_neon.h":607:14 -1
(nil))
(insn 7 21 8 2 (parallel [
(set (pc)
(pc))
(set (subreg:SI (reg:DI 101 [ _5 ]) 0)
(const_int 610839792 [0x2468acf0]))
]) "include/arm_neon.h":607:14 17 {addsi3_compare_op1}
(expr_list:REG_DEAD (reg:SI 104 [ _6 ])
(nil)))
The second set can't be moved to the i2 location, because (reg:DI 101)
is modified in insn 21 and so if setting half of it to 610839792 is
moved from insn 7 where it modifies what was previously 0 into a location
where it overwrites something and is later overwritten in insn 21, we get
different behavior.
And the second case is mentioned in the PR119291 commit log:
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
(set (pc)
(pc))
(set (reg/v:SI 116 [ e ])
(const_int 0 [0]))
]) "pr119291.c":28:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
i2 is insn 22, i3 is insn 25 after in progress modifications and the
second set can't be moved to i2 location, because (reg/v:SI 116) is used
in insn 23, so with it being set to 0 around insn 22 insn 23 will see
a different value.
So, I'm afraid we need both the modified_between_p and reg_used_between_p
check. We don't need the SET_DEST (setN) != pc_rtx checks, those were
added because modified_between_p (pc_rtx, i2, i3) returns true if start
is not the same as end, but reg_used_between_p doesn't behave like that.
2026-01-07 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119291
PR rtl-optimization/121773
* combine.cc (try_combine): Check that SET_DEST (setN) is neither
modified_between_p nor reg_used_between_p instead of just not
reg_used_between_p or pc_rtx.
Jakub Jelinek [Wed, 7 Jan 2026 14:00:50 +0000 (15:00 +0100)]
libstdc++: Use gnu_inline attribute on constexpr exception methods [PR123183]
As mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2026-January/704712.html
in the gnu::constexpr_only thread, gnu::gnu_inline attribute actually
seems to work for most of what we need for C++26 constexpr exceptions
(i.e. when we want out of line bodies for C++ < 26 and need to use
constexpr for C++26, yet don't want for reasons mentioned in those
two PRs the bodies of those constexpr methods to be emitted inline).
Unfortunately clang++ doesn't handle it 100% properly and requires
the redundant inline keyword to make it work (even when the methods
are constexpr and thus implicilty inline), g++ doesn't require that,
so the patch adds also the redundant inline keywords and not just
the [[__gnu__::__gnu_inline__]] attribute.
This way if something wants to inline those functions it can, but
if their address is taken, we just rely on libstdc++.{so,a} to provide
those (which it does as before because those TUs are compiled with
older -std= modes).
The earlier r16-6477-gd5743234731 commit made sure gnu::gnu_inline
constexpr virtual methods can be key methods, so vtables and rtti can
be emitted only in the TU defining non-gnu_inline versions of those.
Alfie Richards [Tue, 7 Oct 2025 14:16:16 +0000 (14:16 +0000)]
aarch64: Add support for fmv priority syntax.
Adds support for the AArch64 fmv priority syntax.
This allows users to override the default function ordering.
For example:
```c
int bar [[gnu::target_version("default")]] (int){
return 1;
}
int bar [[gnu::target_version("dotprod;priority=2")]] (int) {
return 2;
}
int bar [[gnu::target_version("sve;priority=1")]] (int) {
return 3;
}
```
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_parse_fmv_features): Add parsing
for priority arguments.
(aarch64_process_target_version_attr): Update call to
aarch64_parse_fmv_features.
(get_feature_mask_for_version): Update call to
aarch64_parse_fmv_features.
(aarch64_compare_version_priority): Add logic to order by priority if present.
(aarch64_functions_b_resolvable_from_a): Update call to
aarch64_parse_fmv_features.
(aarch64_mangle_decl_assembler_name): Update call to
aarch64_parse_fmv_features.
(dispatch_function_versions): Add logic to sort by priority.
(aarch64_same_function_versions): Add diagnostic if invalid use of
priority syntax.
(aarch64_merge_decl_attributes): Add logic to make suer priority
arguments are preserved.
(aarch64_check_target_clone_version): Update call to
aarch64_parse_fmv_features.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/fmv_priority3.c: New test.
* gcc.target/aarch64/fmv_priority_error1.c: New test.
* gcc.target/aarch64/fmv_priority_error2.c: New test.
Alfie Richards [Tue, 7 Oct 2025 13:01:09 +0000 (13:01 +0000)]
targethooks: Change SAME_FUNCTION_VERSIONS hook to support checking mergeability
This changes the hook to support checking version mergeability for cases
where the version strings do imply the same version, but are conflicting
in some other way so cannot be merged.
This is a change required for adding priority version support in aarch64.
gcc/ChangeLog:
* target.def (TARGET_OPTION_SAME_FUNCTION_VERSIONS): Update
documentation.
* tree.cc (disjoint_version_decls): Change for new NULL parameter
to same_function_versions.
(diagnose_versioned_decls): Update to pass diagnostic location to
same_function_versions.
* doc/tm.texi: Regenerate.
* config/aarch64/aarch64.cc (aarch64_same_function_versions):
Update hook impl for new arguments.
* config/riscv/riscv.cc (riscv_same_function_versions): Update
hook impl for new arguments.
* config/loongarch/loongarch.cc
(loongarch_same_function_versions): Likewise
* hooks.cc (hook_stringslice_stringslice_unreachable): Changed
to...
(hook_stringslice_consttree_stringslice_consttree_unreachable):
...this and add extra arguments.
* hooks.h (hook_stringslice_stringslice_unreachable): Changed
to...
(hook_stringslice_consttree_stringslice_consttree_unreachable):
and add extra arguments.
Martin Jambor [Wed, 7 Jan 2026 10:53:15 +0000 (11:53 +0100)]
ipa-cp: Multiple sweeps over the call-graph in the decision stage
Currently, IPA-CP makes only one sweep in the decision stage over the
call-graph, meaning that some clonin , even if relatively cheap, may
not be performed because the pass runs out of the overall growth
budget before it gets to evaluating it. By making more (three by
default, but configurable with a parameter) sweeps over the call-graph
with progressivelly stricter cost limits, the more benefitial
candidates will have a better chance to be cloned before others.
gcc/ChangeLog:
2025-07-08 Martin Jambor <mjambor@suse.cz>
* params.opt (param_ipa_cp_sweeps): New.
* doc/invoke.texi (ipa-cp-sweeps): New.
* ipa-cp.cc (max_number_sweeps): New.
(get_max_overall_size): New parameter cur_sweep, use it and the total
number of sweeps from the NODE to calculate the result too.
(ipcp_propagate_stage): Get the maximum number of sweeps specified in
the corresponding parameter of any possibly affected node.
(good_cloning_opportunity_p): Add parameter cur_sweep, adjust the
threshold according to it.
(decide_about_value): New parameter cur_sweep, pass it to
get_max_overall_size and to good_cloning_opportunity_p.
(decide_whether_version_node): New parameter cur_sweep, pass it to
decide_about_value and get_max_overall_size. Make sure the node is
not dead.
(ipcp_decision_stage): Make multiple sweeps over the call-graph.
Martin Jambor [Wed, 7 Jan 2026 10:53:14 +0000 (11:53 +0100)]
ipa-cp: Move decision to clone for all contexts to decision stage
Currently, IPA-CP makes decisions to clone a function for all (known)
contexts in the evaluation phase, in a separate sweep over the call
graph from the decisions about cloning for values available only in
certain contexts. This patch moves it to the decision stage, which
requires slightly more computation at the decision stage but the
benefit/cost heuristics is also likely to be slightly better because
it can be calculated using the call graph edges that remain after any
cloning for special contexts. Perhaps more importantly, it also
allows us to do multiple decision sweeps over the call graph with
different "parameters."
gcc/ChangeLog:
2025-07-02 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_node_params): Remove member do_clone_for_all_contexts.
(ipa_node_params::ipa_node_params): Do not initialize
do_clone_for_all_contexts.
* ipa-cp.cc (gather_context_independent_values): Remove parameter
calculate_aggs, calculate them always.
(estimate_local_effects): Move the decision whether to clone for
all context...
(decide_whether_version_node): ...here. Fix dumps.
(decide_about_value): Adjust alignment in dumps.
Rainer Orth [Wed, 7 Jan 2026 08:52:39 +0000 (09:52 +0100)]
fixincludes: Remove unnecessary Solaris fixes
Many fixincludes fixes are no longer applied on Solaris 11.4, usually
because they have been incorporated into the system headers. Sometimes
this happened as early as Solaris 10 already.
A few still were applied although unnecessarily, usually because they
have been applied to system headers in a slightly different way.
This patch removes all such fixes or disables the unnecessary ones that
aren't Solaris-specific on Solaris only. While the solaris_math_12 fix
isn't necessary in current Solaris 11.4 SRUs, it was kept since it still
applies to Solaris 11.4 FCS.
Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11. I've also checked that the fixes applied to the
11.4 FCS headers are identical to those before this patch, with the
exception of those that are no longer actually needed.
Richard Biener [Tue, 6 Jan 2026 13:10:38 +0000 (14:10 +0100)]
tree-optimization/123316 - avoid ICE due to lack of PHI patterns
With bools we can end up with mixed vector types in PHI nodes due
to PHIs not having pattern stmts. Avoid this when analyzing
a nested cycle, similar to how we already to when analyzing BB
vectorization PHIs.
Rainer Orth [Wed, 7 Jan 2026 05:53:23 +0000 (06:53 +0100)]
Allow disabling -gctf non-C warning [PR123259]
In mixed-language builds it may be difficult to restrict -gctf to only
C-language sources. However, the
cc1plus: note: CTF debug info requested, but not supported for ‘GNU C++17’ frontend
warning for non-C languages, which is perfectly benign, may confuse
parts of the build, so it may be useful to disable it.
This patch applies the existing -Wno-complain-wrong-lang option to
suppress it.
Bootstrapped without regressions on i386-pc-solaris2.11
sparc-sun-solaris2.11, also with C/C++-only bootstraps that apply
-gctf/-gsctf via STAGE[23]_CFLAGS and STAGE[23]_TFLAGS.
warn_access: Limit waccess2 to dangling pointer checks [PR 123374]
The second pass of warn_access (waccess2) was added to implement
dangling pointer checks but it implicitly ran the early checks too,
which issues false warnings on code that has not been fully optimized.
Limit this second run to only dangling pointer checks for call
statements. This does not break any of the existing warning tests, so
it didn't seem to add any actual value for the additional run anyway.
gcc/ChangeLog:
PR tree-optimization/123374
* gimple-ssa-warn-access.cc (pass_waccess::set_pass_param): Add
a second parameter.
(pass_waccess::check_call): Skip access checks for waccess2.
(pass_waccess::execute): Drop initialization of
M_CHECK_DANGLING_P.
* passes.def: Adjust.
gcc/testsuite/ChangeLog:
PR tree-optimization/123374
* g++.dg/warn/pr123374.C: New test.
Sebastian Huber [Mon, 29 Dec 2025 23:41:38 +0000 (00:41 +0100)]
gcov: Fix counter update method selection
The counter update method selection had some issues.
For PROFILE_UPDATE_ATOMIC, if atomic updates are not supported, then
fallback to single mode, however, use partial atomic updates if
available. Issue warnings.
For PROFILE_UPDATE_PRFER_ATOMIC, if atomic updates are not supported,
then fallback to single mode, however, use partial atomic updates if
available. Do not issue warnings.
gcc/ChangeLog:
* tree-profile.cc (tree_profiling): Do not use atomic operations
if they are not available. Try to use at least partial atomic
updates as a fallback.
Jeff Law [Tue, 6 Jan 2026 23:16:56 +0000 (16:16 -0700)]
[PR target/123269] Adjust predcomm testcases to avoid vectorization
Thankfully this "bug" is just a case where after Robin's change we're
vectorizing cases we weren't before which in turn doesn't give predcom the
opportunity to optimize the code.
Like on existing predcom test we can restore the test's intent by using
-fno-tree-vectorize.
Tested x86_64 and the various crosses to ensure nothing regressed. Pushing to
the trunk.
This pattern is only emitted during function epilogue expansion (obviously
after register allocation), so putting reload_completed in the condition
is redundant.
This patch also changes the declaration of the return register (A0 address
register) required for normal function returns to properly defining the
EPILOGUE_USES macro, as is already done on other targets, rather than
placing '(use (reg:SI A0_REG))' RTX.
gcc/ChangeLog:
* config/xtensa/xtensa.h (EPILOGUE_USES): New macro definition.
* config/xtensa/xtensa.md (return):
Remove '(use (reg:SI A0_REG))' from the template description, and
reload_completed from the condition.
(sibcall_epilogue): Remove emitting '(use (reg:SI A0_REG))'.
Tamar Christina [Tue, 6 Jan 2026 15:00:44 +0000 (15:00 +0000)]
vect: Add check for BUILT_IN_NORMAL to ifcvt [PR122103]
It was reported that some AVX10 test like
gcc.target/i386/avx10_2-vcvtbf162ibs-2.c ICEd with my
changes. It turns out it's due to associated_internal_fn
only supporting BUILT_IN_NORMAL calls.
This adds a check for this before calling
associated_internal_fn.
Manually tested the files since they have an effective
target tests for hardware I don't have.
gcc/ChangeLog:
PR tree-optimization/122103
* tree-if-conv.cc (ifcvt_can_predicate): Add check for
normal builtins.
Richard Ball [Tue, 6 Jan 2026 14:26:20 +0000 (14:26 +0000)]
aarch64: Add support for __pldir intrinsic
This patch adds support for the __pldir intrinsic.
This is a new prefetch intrinsic which declares an
intent to read from an address.
This intrinsic is part of FEAT_PCDPHINT.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(enum aarch64_builtins): New builtin flag.
(aarch64_init_pcdphint_builtins): New builtin function.
(aarch64_expand_pldir_builtin): Expander for new intrinsic.
(aarch64_general_expand_builtin): Call new expander.
* config/aarch64/aarch64.md
(aarch64_pldir): New pattern for instrinsic.
* config/aarch64/arm_acle.h
(__attribute__): New call to builtin.
(__pldir): Likewise.
Richard Ball [Tue, 6 Jan 2026 14:26:20 +0000 (14:26 +0000)]
aarch64: Add support for FEAT_PCDPHINT atomic_store intrinsics.
This patch adds support for the atomic_store_with_stshh intrinsic
in aarch64. This intrinsic is part of FEAT_PCDPHINT.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(enum aarch64_builtins): Add new flags.
(aarch64_init_pcdphint_builtins): Create new Builtin functions.
(aarch64_general_init_builtins): Call init for PCDPHINT.
(aarch64_expand_stshh_builtin): Expander for new intrinsic.
(aarch64_general_expand_builtin): Call new expander.
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): New feature.
* config/aarch64/aarch64.h (TARGET_PCDPHINT): Likewise.
* config/aarch64/arm_acle.h
(__atomic_store_with_stshh): Generic to call builtins.
* config/aarch64/atomics.md
(@aarch64_atomic_store_stshh<mode>): New pattern for intrinsic.
* config/aarch64/iterators.md: New UNSPEC.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/atomic_store_with_stshh.c: New test.
Eric Botcazou [Tue, 6 Jan 2026 14:18:34 +0000 (15:18 +0100)]
Fix gcc.c-torture/execute/pr110817-[13].c on the SPARC
As discussed in the audit trail, the TARGET_VECTORIZE_GET_MASK_MODE hook of
the SPARC back-end always returns Pmode (SImode would probably have been OK)
and this causes build_truth_vector_type_for_mode to generate questionable
types like:
<vector_type 0x7ffff6f6da80
type <boolean_type 0x7ffff6f6d9d8 public QI
size <integer_cst 0x7ffff6e04f18 constant 8>
unit-size <integer_cst 0x7ffff6e04f30 constant 1>
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff6f6d9d8 precision:1 min <integer_cst 0x7ffff6f69678 -1> max
<integer_cst 0x7ffff6f7deb8 0>>
DI
size <integer_cst 0x7ffff6e04e28 type <integer_type 0x7ffff6e150a8
bitsizetype> constant 64>
unit-size <integer_cst 0x7ffff6e04e40 type <integer_type 0x7ffff6e15000
sizetype> constant 8>
align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x7ffff6f6da80 nunits:1>
which then go through this trick in store_constructor:
/* Ensure no excess bits are set.
GCN needs this for nunits < 64.
x86 needs this for nunits < 8. */
auto nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
if (maybe_ne (GET_MODE_PRECISION (mode), nunits))
tmp = expand_binop (mode, and_optab, tmp,
GEN_INT ((HOST_WIDE_INT_1U << nunits) - 1),
target, true, OPTAB_WIDEN);
if (tmp != target)
emit_move_insn (target, tmp);
break;
}
to yield code that cannot possibly work on a big-endian platform.
Coaxing build_truth_vector_type_for_mode to generate more sensible types
fixes the problem but runs afoul of the TARGET_VECTORIZE_GET_MASK_MODE
hook for some AVX512 modes, so is probably not worth the risk. Moreover,
I didn't manage to come up with a big-endian implementation of the above
trick that would make some sense for the questionable vector types, so the
fix simply disables it.
gcc/
PR target/121192
* expr.cc (store_constructor) <VECTOR_TYPE>: Disable the special
trick for uniform boolean vectors with integer modes and single-bit
mask entries on big-endian platforms.
Artemiy Volkov [Fri, 2 Jan 2026 11:18:19 +0000 (11:18 +0000)]
testsuite: rework some vect/complex testcases
This is the second stab at
https://gcc.gnu.org/pipermail/gcc-patches/2026-January/704823.html, which
concerns cleaning up some testcases in gcc.dg/vect/complex. The original
commit message reads:
---------------------------------------------------------------------------
Some of the testcases in the gcc.dg/vect/complex directory try to match
"stmt.*$internal_fn" in the slp1/vect logs, which leads to many false
positives; this patch changes this to "add new stmt: \[^\n\r]*$internal_fn",
making sure that the log fragments matched in this way are limited to
single lines and correspond to actual newly created GIMPLE statements.
This main change results in some fallout, necessitating the following
additional tweaks:
- For fast-math testcases, replace the "1"s in scan-tree-dump-times
directives by appropriate counts.
- XFAIL bb-slp and vect testcases featuring integral types,
since the cadd{90,270} optabs are not implemented for integral modes.
- Disable some FP16 tests for arm targets due to absence of cadd{90,270}
for V8HF.
- Replace "target { X } && ! target { Y }" selectors with the correct
"target { X && { ! Y } }" form.
- In bb-slp-complex-add-pattern-long.c, adjust the testcase header to
match other tests so that different scan-tree-dump-times directives
can be switched off selectively.
- In bb-slp-complex-add-pattern-long.c, remove an extraneous scan for
"Found COMPLEX_ADD_ROT90".
- In bb-slp-complex-add-pattern-int.c, use vect_complex_add_int instead of
vect_complex_add_byte.
---------------------------------------------------------------------------
Following Tamar's feedback, tweaks 2 and 3 above have been fixed by these
changes since v1:
- Change what dg-add-options does for arm_v8_3a{,_fp16}_complex_neon so
that the correct flags are returned regardless of configure-time values
of -mfpu.
- For integer tests, require MVE rather than AdvSIMD from the arm
backend's side, as only that ISA has cadd{90,270} for integral modes.
- Un-XFAIL testcases that gcc is currently able to vectorize, separately
for the arm and aarch64 backends.
Re-regtested on aarch64 (with and without SVE2) and arm.
Eric Botcazou [Tue, 6 Jan 2026 10:26:01 +0000 (11:26 +0100)]
Ada: Clear possible confusion in doc/install.texi
The sentence:
"If you need to build an intermediate version of GCC in order to
bootstrap current GCC, consider GCC 9.5: it can build the current Ada
and D compilers, and was also the version that declared C++17 support
stable."
is possibly confusing because it globs Ada and D together, whereas Ada
imposes no further requirement over C++ (GCC 5.4+) unlike D (GCC 9.4+).
gcc/
* doc/install.texi (Prerequisites): Remove reference to Ada in
conjunction with GCC 9.5 and adjust its GCC version requirement.
The order of evaluation of function arguments is unspecified in C++.
The function object_sizes_set_temp called object_sizes_set with two
calls to make_ssa_name() as arguments. Since make_ssa_name() has the
side effect of incrementing the global SSA version counter, different
architectures of the same compiler evaluated these calls in different
orders.
This resulted in non-deterministic SSA version numbering between
x86_64 and aarch64 hosts during cross-compilation, leading to
divergent object files.
Sequencing the calls into separate statements ensures deterministic
evaluation order.
2026-01-06 Jakub Jelinek <jakub@redhat.com>
Marco Falke <falke.marco@gmail.com>
PR tree-optimization/123351
* tree-object-size.cc (object_sizes_set_temp): Separate calls to
make_ssa_name to ensure deterministic execution order.
Thomas Koenig [Sun, 4 Jan 2026 19:09:39 +0000 (20:09 +0100)]
Generate a runtime error on recursive I/O, thread-safe
This patch is a version of Jerry's patch with one additional feature.
When locking a unit, the thread ID of the locking thread also stored
in the gfc_unit structure. When the unit is found to be locked, it can
be either have been locked by the same thread (bad, recursive I/O) or
by another thread (harmless).
Regression-tested fully (make -j8 check in the gcc build directory) on
Linux, which links in pthreads by default. Steve checked on FreeBSD,
which does not do so.
Jerry DeLisle <jvdelisle@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR libfortran/119136
gcc/fortran/ChangeLog:
* libgfortran.h: Add enum for new LIBERROR_RECURSIVE_IO.
libgfortran/ChangeLog:
* io/async.h (UNLOCK_UNIT): New macro.
(TRYLOCK_UNIT): New macro.
(LOCK_UNIT): New macro.
* io/io.h: Delete prototype for unused stash_internal_unit.
(check_for_recursive): Add prototype for this new function.
* io/transfer.c (data_transfer_init): Add call to new
check_for_recursive.
* io/unit.c (delete_unit): Fix comment.
(check_for_recursive): Add new function.
(init_units): Use new macros.
(close_unit_1): Likewise.
(unlock_unit): Likewise.
* io/unix.c (flush_all_units_1): Likewise.
(flush_all_units): Likewise.
* runtime/error.c (translate_error): : Add translation for
"Recursive I/O not allowed runtime error message.
supers1ngular [Tue, 6 Jan 2026 01:09:02 +0000 (17:09 -0800)]
openmp: Improve Fortran Diagnostics for Linear Clause
This patch improves diagnostics for the linear clause,
providing a more accurate and intuitive recommendation
for remediation if the deprecated syntax is used.
Additionally updates the relevant test to reflect the
changed verbiage of the warning.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_clauses): New diagnostic logic.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/pr84418-1.f90: Fix verbiage of
dg-warning to reflect updated warning.
Tamar Christina [Mon, 5 Jan 2026 20:56:03 +0000 (20:56 +0000)]
vect: teach vectorizable_call to predicate calls when they can trap [PR122103]
The following example
void f (float *__restrict c, int *__restrict d, int n)
{
for (int i = 0; i < n; i++)
{
c[i] = __builtin_sqrtf (c[i]);
}
}
compiled with -O3 -march=armv9-a -fno-math-errno -ftrapping-math needs to be
predicated on the conditional. It's invalid to execute the branch and use a
select to extract it later unless using -fno-trapping-math.
However as discussed in PR96373 while we probably shouldn't vectorize for the
cases where we can trap but don't support conditional operation there doesn't
seem to be a clear consensus on how GCC should handle trapping math.
As such similar to PR96373 I don't stop vectorization if trapping math and
the conditional operation isn't supported.
PR tree-optimization/122103
* gcc.target/aarch64/sve/pr122103_4.c: New test.
* gcc.target/aarch64/sve/pr122103_5.c: New test.
* gcc.target/aarch64/sve/pr122103_6.c: New test.
Tamar Christina [Mon, 5 Jan 2026 20:55:34 +0000 (20:55 +0000)]
vect: teach if-convert to predicate __builtin calls [PR122103]
The following testcase
void f (float *__restrict c, int *__restrict d, int n)
{
for (int i = 0; i < n; i++)
{
if (d[i] > 1000)
c[i] = __builtin_sqrtf (c[i]);
}
}
compiled with -O3 -march=armv9-a -fno-math-errno -ftrapping-math needs to be
predicated on the conditional. It's invalid to execute the branch and use a
select to extract it later unless using -fno-trapping-math.
This change in if-conversion changes what we used to generate:
PR tree-optimization/122103
* gcc.target/aarch64/sve/pr122103_1.c: New test.
* gcc.target/aarch64/sve/pr122103_2.c: New test.
* gcc.target/aarch64/sve/pr122103_3.c: New test.
Tamar Christina [Mon, 5 Jan 2026 20:55:05 +0000 (20:55 +0000)]
vect: update tests for -ftrapping-math support [PR122103]
Before going any further, this updates the existing testcases that really
require -fno-trapping-math to now use that.
It also adds three new tests for SVE. They will however fail until the last
patch but that's fine.
Notable is testcase gcc.target/aarch64/sve/unpacked_cond_frinta_2.c which
without -ftrapping-math (which it's explicitly checking for) generates worse
code because the vectorizer forces an unneeded unpack. This is however the
same issue with how the vectorizer picks VF as we've seen a number of times.
Tamar Christina [Mon, 5 Jan 2026 20:54:35 +0000 (20:54 +0000)]
middle-end: extend fma -> fms transformation to conditional optab [PR122103]
Currently in the simplications between if-conversion and vect we rely on
match.pd to rewrite FMA into FMS if the accumulator is on a negated value.
However if if-conversion instead produces a COND_FMA then this doesn't work and
so the vectorizer can't generate a vector FMS or it's other variant.
This extends the rules to include the COND_FMA variants. Because this happens
before the vectorization the vectorizer will take care of generating the LEN
variants and as such we don't need match.pd to know about those.
The added rules are the same as the ones directly above them just changing
FMA to COND_FMA.
gcc/ChangeLog:
PR tree-optimization/122103
* match.pd: Add COND_FMA to COND_FMS rewrite rules.
Tamar Christina [Mon, 5 Jan 2026 20:53:46 +0000 (20:53 +0000)]
middle-end: Add new conditional IFNs for existing math IFNs [PR122103]
For a few math IFNs we never declared the conditional variants. This is needed
to handle trapping math correctly. SVE already implements all of these using
the expected optabs.
This just adds the COND and COND_LEN optabs for SQRT, CEIL, FLOOR, ROUND and
RINT.
Note that we don't seem to have any documentation for the math IFNs as they look
like they're all on the optabs/original builtins. As such I only documented the
optabs as that's consistent.
which is incorrect, fsqrt can raise FE exceptions and so should be masked on p7
as the inactive lanes can trigger incorrect FE errors as the code in the PR
demonstrates.
In GCC 13 this was partially addressed for instruction that got lowered to IFNs
through r13-5979-gb9c78605039f839f3c79ad8fca4f60ea9a5654ed but it never
addressed __builtin_math_fns. Assuming the direction of travel in PR96373 is
still valid this extends the support.
While ERRNO trapping is controlled through flags, it looks like for trapping
math the calls and IFNs are not marked specifically. Instead in
gimple_could_trap_p_1 through operation_could_trap_p we default to all floating
point operation could trap if flag_trapping_math.
This extends gimple_could_trap_p_1 to do the same for __builtin_math_fns but
exclude instructions that the standard says can't raise FEs.
Jeff Law [Mon, 5 Jan 2026 16:34:28 +0000 (09:34 -0700)]
[RISC-V] Restore inline expansion of block moves on RISC-V in some cases
Edwin's patch to add a --param for a size threshold on block moves
inadvertently disabled using inline block moves for cases where the count is
unknown. This caused testsuite regressions (I don't remember which test, it
was ~6 weeks ago if not longer). I'd hoped Edwin would see the new failures,
but I suspect he's buried by transition stuff with Rivos/Meta.
This patch restores prior behavior when the count is unknown and no --param was
specified.
Bootstrapped and regression tested on both the BPI and Pioneer systems and
regression tested on riscv{32,64}-elf as well.
Pushing to the trunk after pre-commit CI does its thing.
gcc/
* config/riscv/riscv-string.cc (expand_block_move): Restore using
inlined memcpy/memmove for unknown counts if the param hasn't been
specified.
(expand_vec_setmem): Similarly for memset.
Pan Li [Mon, 5 Jan 2026 16:28:04 +0000 (09:28 -0700)]
[PATCH v1 2/2] RISC-V: Add run test case for vwadd/vwsub wx mis combine [PR123317]
From: Pan Li <pan2.li@intel.com>
Add test cases for the mis combine of the vwadd/vwsub vx combine.
PR target/123317
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr123317-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr123317-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr123317-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr123317-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/pr123317-run.h: New test.
The vwaddu/vwsubu wx combine patterns take the any_extend by
mistake, it is unsigned so we must leverage zero_extend here.
This PATCH would like to fix this which result in sign_extend
code pattern combine to vwaddu/vwsub.wx.
PR target/123317
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Take zero_extend for
both the vwaddu and vwsubu wx pattern.
Alice Carlotti [Tue, 30 Dec 2025 10:12:45 +0000 (10:12 +0000)]
aarch64 doc: Fix incorrect function name
The documentation for aarch64's -mtrack-speculation referred to the
builtin function __builtin_speculation_safe_copy, but the actual
function name is __builtin_speculation_safe_value.
Pan Li [Sun, 28 Dec 2025 08:33:27 +0000 (16:33 +0800)]
Vect: Adjust depth_limit of vec_slp_has_scalar_use from 2 to 3
The test case of RISC-V vx-6-u8.c is failed for the vaaddu.vx asm check
when --param=gpr2vr-cost=2 recently. After some investigation, it is
failed to vectorize afte some middle-end changes. The depth_limit is 2
of the func vec_slp_has_scalar_use, and then return -1 by design. Then the
slp_insntance got 12 in size and we may see log similar as below:
*_2 1 times vec_to_scalar costs 3 in epilogue
*_2 1 times vec_to_scalar costs 3 in epilogue
*_2 1 times vec_to_scalar costs 3 in epilogue
*_2 1 times vec_to_scalar costs 3 in epilogue
Vector cost: 18
Scalar cost: 9
And then cannot vectorize due to cost consideration.
This PATCH would like to adjust the depth_limit to 3 suggested by
Richard.
gcc/ChangeLog:
* tree-vect-slp.cc (vec_slp_has_scalar_use): Adjust the
depth_limit from 2 to 3.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/sat_add-cost-1.c: New test.
Tamar Christina [Mon, 5 Jan 2026 14:27:14 +0000 (14:27 +0000)]
AArch64: tweak inner-loop penalty when doing outer-loop vect [PR121290]
r16-3394-g28ab83367e8710a78fffa2513e6e008ebdfbee3e added a cost model adjustment
to detect invariant load and replicate cases when doing outer-loop vectorization
where the inner loop uses a value defined in the outer-loop.
In other words, it's trying to detect the cases where the inner loop would need
to do an ld1r and all inputs are then working on replicated values. The
argument is that in this case the vector loop is just the scalar loop since each
lane just works on the duplicated values.
But it had two short comings.
1. It's an all or nothing thing. The load and replicate may only be a small
percentage of the amount of data being processed. As such this patch now
requires the load and replicate to be at least 50% of the leafs of an SLP
tree. Ideally we'd just only increase body by VF * invariant leafs, but we
can't since the middle-end cost model applies a rather large penalty to the
scalar code (* 50) and as such the base cost ends up being too high and we
just never vectorize. The 50% is an attempt to strike a balance in this
awkward situation. Experiments show it works reasonably well and we get the
right codegen in all the test cases.
2. It does not keep in mind that a load + replicate where that vector value is
used in a by index operation will result in is decomposing the load back to
scalar. e.g.
ld1r {v0.4s}, x0
mul v1.4s, v2.4s, v0.4s
is transformed into
ldr s0, x0
mul v1.4s, v2.4s, v0.s[0]
and as such this case may actually be profitable because we're only doing a
scalar load of a single element, similar to the scalar loop.
This patch tries to detect (loosely) such cases and doesn't apply the penalty
for these. It's a bit hard to tell whether we end up with a by index
operation so early as the vectorizer itself is not aware of them and as such
the patch does not do an exhaustive check, but only does the most obvious
one.
gcc/ChangeLog:
PR target/121290
* config/aarch64/aarch64.cc (aarch64_possible_by_lane_insn_p): New.
(aarch64_vector_costs): Add m_num_dup_stmts and m_num_total_stmts.
(aarch64_vector_costs::add_stmt_cost): Use them.
(adjust_body_cost): Likewise.
gcc/testsuite/ChangeLog:
PR target/121290
* gcc.target/aarch64/pr121290.c: Move to...
* gcc.target/aarch64/pr121290_1.c: ...here.
* g++.target/aarch64/pr121290_1.C: New test.
* gcc.target/aarch64/pr121290_2.c: New test.
void f(const int *restrict in,
int *restrict out,
int n, int threshold)
{
for (int i = 0; i < n; ++i) {
int v = in[i];
if (v > threshold) {
int t = v * 3;
t += 7;
t ^= 0x55;
t *= 0x55;
t -= 0x5;
t &= 0xFE;
t ^= 0x55;
out[i] = t;
} else {
out[i] = v;
}
}
}
compiled at -O2
results in aggressive if-conversion which increases the number of dynamic
instructions and the latency of the loop as it has to wait for t to be
calculated now in all cases.
This has led to big performance losses in packages like zstd [1] which in turns
affects packaging and LTO speed.
The default cost model for if-conversion is overly permissive and allows if
conversions assuming that branches are very expensive.
This patch implements an if-conversion cost model for AArch64. AArch64 has a
number of conditional instructions that need to be accounted for, however this
initial version keeps things simple and is only really concerned about csel.
The issue specifically with csel is that it may have to wait for two argument
to be evaluated before it can be executed. This means it has a direct
correlation to increases in dynamic instructions.
To fix this I add a new tuning parameter that indicates a rough estimation of
the branch misprediction cost of a branch. We then accept if-conversion while
the cost of this multiplied by the cost of branches is cheaper.
There is a basic detection of CINC and CSET because these usually are ok. We
also accept all if-conversion when not inside a loop. Because CE is not an RTL
SSA pass we can't do more extensive checks like checking if the csel is a loop
carried dependency. As such this is a best effort thing and intends to catch the
most egregious cases like the above.
This recovers the ~25% performance loss in zstd decoding and gives better
results than GCC 14 which was before the regression happened.
Additionally I've benchmarked on a number of cores all the attached examples
and checked various cases. On average the patch gives an improvement between
20-40%.
PR target/123017
* gcc.target/aarch64/pr123017_1.c: New test.
* gcc.target/aarch64/pr123017_2.c: New test.
* gcc.target/aarch64/pr123017_3.c: New test.
* gcc.target/aarch64/pr123017_4.c: New test.
* gcc.target/aarch64/pr123017_5.c: New test.
* gcc.target/aarch64/pr123017_6.c: New test.
* gcc.target/aarch64/pr123017_7.c: New test.
Paul Thomas [Mon, 5 Jan 2026 07:05:36 +0000 (07:05 +0000)]
Fortran: ICE in type-bound function with PDT result [PR 123071]
2026-01-05 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/123071
* resolve.cc (resolve_typebound_function): Make sure that the
class declared type is resolved.
(resolve_allocate_deallocate): Any kind of expr3 array ref will
need resolution not just constant size refs.
* trans-decl.cc (gfc_trans_deferred_vars): Exclude vtabs from
initialization.
(emit_not_set_warning): New function using code extracted from
gfc_generate_function_code.
(gfc_generate_function_code): PDT module procedures results
that have not been referenced must have the fake_result_decl
added to the symbol and emit_not_set_warning called. Likewise
replace explicit code with call to emit_not_set_warning.
gcc/testsuite
PR fortran/123071
* gfortran.dg/pdt_79.f03: New test.