Richard Biener [Wed, 30 Jul 2025 13:14:44 +0000 (15:14 +0200)]
Remove most of the epilogue vinfo fixup
The following removes the fixup we apply to pattern stmt operands
before code generating vector epilogues. This isn't necessary anymore
since the SLP graph now exclusively records the data flow. Similarly
fixing up of SSA references inside DR_REF of gather/scatter isn't
necessary since we now record the analysis result and avoid re-doing
it during transform.
What we still need to keep is the adjustment of the actual pointers
to gimple stmts from stmt_vec_info and the back-reference from the DRs.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove
fixing up pattern stmt operands and gather/scatter DR_REFs.
(find_in_mapping): Remove.
Richard Biener [Mon, 28 Jul 2025 11:48:39 +0000 (13:48 +0200)]
Record get_load_store_info results from analysis
The following is a patch to make us record the get_load_store_info
results from load/store analysis and re-use them during transform.
In particular this moves where SLP_TREE_MEMORY_ACCESS_TYPE is stored.
A major hassle was (and still is, to some extent), gather/scatter
handling with it's accompaning gather_scatter_info. As
get_load_store_info no longer fully re-analyzes them but parts of
the information is recorded in the SLP tree during SLP build the
following goes and eliminates the use of this data in
vectorizable_load/store, instead recording the other relevant
part in the load-store info (namely the IFN or decl chosen).
Strided load handling keeps the re-analysis but populates the
data back to the SLP tree and the load-store info. That's something
for further improvement. This also shows that early classifying
a SLP tree as load/store and allocating the load-store data might
be a way to move back all of the gather/scatter auxiliary data
into one place.
Rather than mass-replacing references to variables I've kept the
locals but made them read-only, only adjusting a few elsval setters
and adding a FIXME to strided SLP handling of alignment (allowing
local override there).
The FIXME shows that while a lot of analysis is done in
get_load_store_type that's far from all of it. There's also
a possibility that splitting up the transform phase into
separate load/store def types, based on VMAT choosen, will make
the code more maintainable.
* tree-vectorizer.h (vect_load_store_data): New.
(_slp_tree::memory_access_type): Remove.
(SLP_TREE_MEMORY_ACCESS_TYPE): Turn into inline function.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Do not
initialize SLP_TREE_MEMORY_ACCESS_TYPE.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Remove gather_scatter_info pointer argument, instead get
info from the SLP node.
(vect_build_one_gather_load_call): Get SLP node and builtin
decl as argument and remove uses of gather_scatter_info.
(vect_build_one_scatter_store_call): Likewise.
(vect_get_gather_scatter_ops): Remove uses of gather_scatter_info.
(vect_get_strided_load_store_ops): Get SLP node and remove
uses of gather_scatter_info.
(get_load_store_type): Take pointer to vect_load_store_data
instead of individual pointers.
(vectorizable_store): Adjust. Re-use get_load_store_type
result from analysis time.
(vectorizable_load): Likewise.
since the TLS call will clobber flags register nor place a TLS call in a
basic block if any live caller-saved registers aren't dead at the end of
the basic block:
;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104
;; live gen 0 [ax] 102 106 108 116 117 118 120
;; live kill 5 [di]
Instead, we should place such call before all register setting basic
blocks which dominate the current basic block.
Keep track the replaced GNU and GNU2 TLS instructions. Use these info to
place the __tls_get_addr call and mark FLAGS register as dead.
gcc/
PR target/121572
* config/i386/i386-features.cc (replace_tls_call): Add a bitmap
argument and put the updated TLS instruction in the bitmap.
(ix86_get_dominator_for_reg): New.
(ix86_check_flags_reg): Likewise.
(ix86_emit_tls_call): Likewise.
(ix86_place_single_tls_call): Add 2 bitmap arguments for updated
GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit
TLS instruction. Correct debug dump for before instruction.
Ben Wu [Tue, 19 Aug 2025 17:49:41 +0000 (13:49 -0400)]
c++: Fix ICE on mangling invalid compound requirement [PR120618]
This testcase caused an ICE when mangling the invalid type-constraint in
write_requirement since write_type_constraint expects a TEMPLATE_TYPE_PARM.
Setting the trailing return type to NULL_TREE when a
return-type-requirement is found in place of a type-constraint prevents the
failed assertion in write_requirement. It also allows the invalid
constraint to be satisfied in some contexts to prevent redundant errors,
e.g. in concepts-requires5.C.
Bootstrapped and tested on x86_64-linux-gnu.
PR c++/120618
gcc/cp/ChangeLog:
* parser.cc (cp_parser_compound_requirement): Set type to
NULL_TREE for invalid type-constraint.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-requires5.C: Don't require
redundant diagnostic in static assertion.
* g++.dg/concepts/pr120618.C: New test.
Andrew Pinski [Mon, 18 Aug 2025 20:33:59 +0000 (13:33 -0700)]
middle-end: Fix malloc like functions when calling with void "return" [PR120024]
When expanding malloc like functions, we copy the return register into a temporary
and then mark that temporary register with a noalias regnote and the alignment.
This works fine unless you are calling the function with a return type of void.
At this point then the valreg will be null and a crash will happen.
A few cleanups are included in this patch because it was easier to do the fix
with the cleanups added.
The start_sequence/end_sequence for ECF_MALLOC is no longer needed; I can't tell
if it was ever needed.
The emit_move_insn function returns the last emitted instruction anyways so
there is no reason to call get_last_insn as we can just use the return value
of emit_move_insn. This has been true since this code was originally added
so I don't understand why it was done that way beforehand.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120024
gcc/ChangeLog:
* calls.cc (expand_call): Remove start_sequence/end_sequence
for ECF_MALLOC.
Check valreg before deferencing it when it comes to malloc like
functions. Use the return value of emit_move_insn instead of
calling get_last_insn.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/malloc-1.c: New test.
* gcc.dg/torture/malloc-2.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Patrick Palka [Tue, 19 Aug 2025 15:07:14 +0000 (11:07 -0400)]
c++: constrained corresponding using from partial spec [PR121351]
When comparing constraints during correspondence checking for a using
from a partial specialization, we need to substitute the partial
specialization arguments into the constraints rather than the primary
template arguments. Otherwise we incorrectly reject e.g. the below
testcase as ambiguous since we substitute T=int* instead of T=int
into #1's constraints and don't notice the correspondence.
This patch corrects the recent r16-2771-gb9f1cc4e119da9 fix by using
outer_template_args instead of TI_ARGS of the DECL_CONTEXT, which
should always give the correct outer arguments for substitution.
PR c++/121351
gcc/cp/ChangeLog:
* class.cc (add_method): Use outer_template_args when
substituting outer template arguments into constraints.
Richard Biener [Tue, 19 Aug 2025 11:38:58 +0000 (13:38 +0200)]
Remove reduction chain detection from parloops
Historically SLP reduction chains were the only multi-stmt reductions
supported. But since we have check_reduction_path more complicated
cases are handled. As parloops doesn't do any specific chain
processing it can solely rely on that functionality instead.
* tree-parloops.cc (parloops_is_slp_reduction): Remove.
(parloops_is_simple_reduction): Do not call it.
Richard Biener [Tue, 19 Aug 2025 10:52:39 +0000 (12:52 +0200)]
A few missing SLP node passings to vector costing
The following fixes another few missed cases to pass a SLP node
instead of a stmt_info.
* tree-vect-loop.cc (vectorizable_reduction): Pass the
appropriate SLP node for costing of single-def-use-cycle
operations.
(vectorizable_live_operation): Pass the SLP node to the
costing hook.
* tree-vect-stmts.cc (vectorizable_bswap): Likewise.
(vectorizable_store): Likewise.
The testcase in the PR shows that when we have a reduction chain
with a wrapped conversion we fail to properly fall back to a
regular reduction, resulting in wrong-code. The following fixes
this by failing discovery. The testcase has other issues, so
I'm not including it here.
PR tree-optimization/121592
* tree-vect-slp.cc (vect_analyze_slp): When SLP reduction chain
discovery fails, fail overall when the tail of the chain
isn't also the entry for the non-SLP reduction.
Richard Biener [Mon, 18 Aug 2025 11:38:37 +0000 (13:38 +0200)]
tree-optimization/121527 - wrong SRA with aggregate copy
SRA handles outermost VIEW_CONVERT_EXPRs but it wrongly ignores
those when building an access which leads to the wrong size
used when the VIEW_CONVERT_EXPR does not have the same size as
its operand which is valid GENERIC and is used by Ada upcasting.
PR tree-optimization/121527
* tree-sra.cc (build_access_from_expr_1): Do not strip an
outer VIEW_CONVERT_EXPR as it's relevant for the size of
the access.
(get_access_for_expr): Likewise.
Tamar Christina [Tue, 19 Aug 2025 09:18:04 +0000 (10:18 +0100)]
AArch64: Use vectype from SLP node instead of stmt_info [PR121536]
commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in
STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead
requires the use of SLP_TREE_VECTYPE for everything but data-refs.
This means that STMT_VINFO_VECTYPE (stmt_info) will always be NULL and so
aarch64_bool_compound_p will never properly cost predicate AND operations
anymore resulting in less vectorization.
This patch changes it to use SLP_TREE_VECTYPE and pass the slp_node to
aarch64_bool_compound_p.
gcc/ChangeLog:
PR target/121536
* config/aarch64/aarch64.cc (aarch64_bool_compound_p): Use
SLP_TREE_VECTYPE instead of STMT_VINFO_VECTYPE.
(aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Pass SLP
node to aarch64_bool_compound_p.
gcc/testsuite/ChangeLog:
PR target/121536
* g++.target/aarch64/sve/pr121536.cc: New test.
Tamar Christina [Tue, 19 Aug 2025 09:17:17 +0000 (10:17 +0100)]
middle-end: Fix costing hooks of various vectorizable_* [PR121536]
commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in
STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead
requires the use of SLP_TREE_VECTYPE for everything but data-refs.
However contrary to what the commit says not all usages of STMT_VINFO_VECTYPE
have been purged from vectorizable_* as the costing hooks which don't pass the
SLP tree as an agrument will extract vectype using STMT_VINFO_VECTYPE.
This results in no vector type being passed to the backends and results in a few
costing test failures in AArch64.
This commit replaces the last few cases I could find, all except for in
vectorizable_reduction when single_defuse_cycle where the stmt being costed is
not the representative of the PHI in the SLP tree but rather the out of tree
reduction statement. So I've left that alone, but it does mean vectype is NULL.
Most likely this needs to use the overload where we pass an explicit vectype but
I wasn't sure so left it for now.
gcc/ChangeLog:
PR target/121536
* tree-vect-loop.cc (vectorizable_phi, vectorizable_recurr,
vectorizable_nonlinear_induction, vectorizable_induction): Pass slp_node
instead of stmt_info to record_stmt_cost.
As a result, we could no longer distinguish between FPR and GPR scalar stmts.
A later commit also removed STMT_VINFO_VECTYPE from stmt_info.
This leaves the only remaining option to get the type of the original stmt in
the stmt_info. This patch does this when we're performing scalar costing.
Ideally I'd refactor this a bit because a lot of the hooks just need to know if
it's FP or not, but this seems pointless with the ongoing costing churn. So for
now this restores our costing.
gcc/ChangeLog:
PR target/121536
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Set
vectype from type of lhs of gimple stmt.
Nathaniel Shead [Sun, 17 Aug 2025 03:00:15 +0000 (13:00 +1000)]
c++/modules: Fix exporting using-decls of unattached purview functions [PR120195]
We have logic to adjust a function decl if it gets re-declared as a
using-decl with different purviewness, but we also need to do the same
if it gets redeclared with different exportedness.
PR c++/120195
gcc/cp/ChangeLog:
* name-lookup.cc (do_nonmember_using_decl): Also handle change
in exportedness of a function.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-32_a.C: New test.
* g++.dg/modules/using-32_b.C: New test.
Nathaniel Shead [Sun, 17 Aug 2025 03:06:52 +0000 (13:06 +1000)]
testsuite: Fix PR108080 testcase for some targets [PR121396]
I added a testcase for the (temporary) warning that we don't currently
support the 'gnu::optimize' or 'gnu::target' attributes in r15-10183;
however, some targets produce target nodes even when only an optimize
attribute is present. This adjusts the warning.
PR c++/108080
PR c++/121396
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr108080.H: Also allow target warnings.
Andrew Pinski [Mon, 18 Aug 2025 19:00:45 +0000 (12:00 -0700)]
docs: Fix __builtin_object_size example [PR121581]
This example used to work (with C) in GCC 14 before the
warning for different pointer types without a cast was changed
to an error.
The fix is to make the q variable `int*` rather than the current `char*`.
This also fixes the example for C++ too.
Currently, the data type of sanitizer flags is unsigned int, with
SANITIZE_SHADOW_CALL_STACK (1UL << 31) being highest individual
enumerator for enum sanitize_code. Use 'sanitize_code_type' data type
to allow for more distinct instrumentation modes be added when needed.
Tomasz Kamiński [Thu, 14 Aug 2025 14:54:16 +0000 (16:54 +0200)]
libstdc++: Add nodiscard attribute for ranges algorithm [PR121476]
This patch adds the [[nodiscard]] attribute to the operator() of ranges
algorithm function objects if their std counterpart has it.
Furthermore, we [[nodiscard]] the operator() of the following ranges
algorithms that lack a std counterpart:
* find_last, find_last_if, find_last_if_not (to match other find
algorithms)
* contains, contains_subrange (to match find/any_of and search)
Finally, [[nodiscard]] is added to std::min and std::max overloads
that accept std::initializer_list. This appears to be an oversight,
as std::minmax is already marked, and other min overloads are as well.
The same applies to corresponding operator() overloads of ranges::min and
ranges::max.
This patch fixes an internal disagreement in gcse about how to
handle partial clobbers. Like many passes, gcse doesn't track
the modes of live values, so if a call clobbers only part of
a register, the pass has to make conservative assumptions.
As the comment in the patch says, this means:
(1) ignoring partial clobbers when computing liveness and reaching
definitions
(2) treating partial clobbers as full clobbers when computing
availability
DF is mostly concerned with (1), so ignores partial clobbers.
compute_hash_table_work did (2) when calculating kill sets,
but compute_transp didn't do (2) when computing transparency.
This led to a nonsensical situation of a register being in both
the transparency and kill sets.
gcc/
PR rtl-optimization/97497
* function-abi.h (predefined_function_abi::only_partial_reg_clobbers)
(function_abi::only_partial_reg_clobbers): New member functions.
* gcse-common.cc: Include regs.h and function-abi.h.
(compute_transp): Check for partially call-clobbered registers
and treat them as not being transparent in blocks with calls.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:57 +0000 (11:46 +0800)]
LoongArch: Implement 16-byte atomic add, sub, and, or, xor, and nand with sc.q
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec.
(UNSPEC_TI_FETCH_SUB): Likewise.
(UNSPEC_TI_FETCH_AND): Likewise.
(UNSPEC_TI_FETCH_XOR): Likewise.
(UNSPEC_TI_FETCH_OR): Likewise.
(UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Likewise.
(ALL_SC): New define_mode_iterator.
(_scq): New define_mode_attr.
(atomic_fetch_nand<mode>): Accept ALL_SC instead of only GPR.
(UNSPEC_TI_FETCH_DIRECT): New define_int_iterator.
(UNSPEC_TI_FETCH): New define_int_iterator.
(amop_ti_fetch): New define_int_attr.
(size_ti_fetch): New define_int_attr.
(atomic_fetch_<amop_ti_fetch>ti_scq): New define_insn.
(atomic_fetch_<amop_ti_fetch>ti): New define_expand.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:54 +0000 (11:46 +0800)]
LoongArch: Implement 16-byte atomic store with sc.q
When LSX is not available but sc.q is (for example on LA664 where the
SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic
store.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Accept "%t" for printing the number of the 64-bit machine
register holding the upper half of a TImode.
* config/loongarch/sync.md (atomic_storeti_scq): New
define_insn.
(atomic_storeti): expand to atomic_storeti_scq if !ISA_HAS_LSX.
Xi Ruoyao [Sun, 27 Apr 2025 07:02:39 +0000 (15:02 +0800)]
LoongArch: Add -m[no-]scq option
We'll use the sc.q instruction for some 16-byte atomic operations, but
it's only added in LoongArch 1.1 evolution so we need to gate it with
an option.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (scq): New evolution
feature.
* config/loongarch/loongarch-evolution.cc: Regenerate.
* config/loongarch/loongarch-evolution.h: Regenerate.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* config/loongarch/loongarch-def.cc: Make -mscq the default for
-march=la664 and -march=la64v1.1.
* doc/invoke.texi (LoongArch Options): Document -m[no-]scq.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:52 +0000 (11:46 +0800)]
LoongArch: Implement 16-byte atomic store with LSX
If the vector is naturally aligned, it cannot cross cache lines so the
LSX store is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic store, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_storeti_lsx): New
define_insn.
(atomic_storeti): New define_expand.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:51 +0000 (11:46 +0800)]
LoongArch: Implement 16-byte atomic load with LSX
If the vector is naturally aligned, it cannot cross cache lines so the
LSX load is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic load, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_loadti_lsx): New define_insn.
(atomic_loadti): New define_expand.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:50 +0000 (11:46 +0800)]
LoongArch: Implement atomic_fetch_nand<GPR:mode>
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand
is expanded to a loop containing a CAS in the body, and CAS itself is a
LL-SC loop so we have a nested loop. This is obviously not a good idea
as we just need one LL-SC loop in fact.
As ~(atom & mask) is (~mask) | (~atom), we can just invert the mask
first and the body of the LL-SC loop would be just one orn instruction.
gcc/ChangeLog:
* config/loongarch/sync.md
(atomic_fetch_nand_mask_inverted<GPR:mode>): New define_insn.
(atomic_fetch_nand<GPR:mode>): New define_expand.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:48 +0000 (11:46 +0800)]
LoongArch: Implement subword atomic_fetch_{and, or, xor} with am*.w instructions
We can just shift the mask and fill the other bits with 0 (for ior/xor)
or 1 (for and), and use an am*.w instruction to perform the atomic
operation, instead of using a LL-SC loop.
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND):
Remove.
(UNSPEC_COMPARE_AND_SWAP_XOR): Remove.
(UNSPEC_COMPARE_AND_SWAP_OR): Remove.
(atomic_test_and_set): Rename to ...
(atomic_fetch_<any_bitwise:amop><SHORT:mode>): ... this, and
adapt the expansion to use it for any bitwise operations and any
val, instead of just ior 1.
(atomic_test_and_set): New define_expand.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:47 +0000 (11:46 +0800)]
LoongArch: Remove unneeded "andi offset, addr, 3" instruction in atomic_test_and_set
On LoongArch sll.w and srl.w instructions only take the [4:0] bits of
rk (shift amount) into account, and we've already defined
SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we
don't need this instruction.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_and_set): Remove
unneeded andi instruction from the expansion.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:46 +0000 (11:46 +0800)]
LoongArch: Remove unneeded "b 3f" instruction after LL-SC loops
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa
or the memory model requires a barrier on failure. But with -mld-seq-sa
and other memory models the barrier may be nonexisting at all, and we
should remove the "b 3f" instruction as well.
The implementation uses a new operand modifier "%T" to output a comment
marker if the operand is a memory order for which the barrier won't be
generated. "%T", and also "%t", are not really used before and the code
for them in loongarch_print_operand_reloc is just some MIPS legacy.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Make "%T" output a comment marker if the operand is a memory
order for which the barrier won't be generated; remove "%t".
* config/loongarch/sync.md (atomic_cas_value_strong<mode>): Add
%T before "b 3f".
(atomic_cas_value_cmp_and_7_<mode>): Likewise.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:45 +0000 (11:46 +0800)]
LoongArch: Don't emit overly-restrictive barrier for LL-SC loops
For LL-SC loops, if the atomic operation has succeeded, the SC
instruction always imply a full barrier, so the barrier we manually
inserted only needs to take the account for the failure memorder, not
the success memorder (the barrier is skipped with "b 3f" on success
anyway).
Note that if we use the AMCAS instructions, we indeed need to consider
both the success memorder an the failure memorder deciding if "_db"
suffix is needed. Thus the semantics of atomic_cas_value_strong<mode>
and atomic_cas_value_strong<mode>_amcas start to be different. To
prevent the compiler from being too clever, use a different unspec code
for AMCAS instructions.
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AMCAS): New
UNSPEC code.
(atomic_cas_value_strong<mode>): NFC, update the comment to note
we only need to consider failure memory order.
(atomic_cas_value_strong<mode>_amcas): Use
UNSPEC_COMPARE_AND_SWAP_AMCAS instead of
UNSPEC_COMPARE_AND_SWAP.
(atomic_compare_and_swap<mode:GPR>): Pass failure memorder to
gen_atomic_cas_value_strong<mode>.
(atomic_compare_and_swap<mode:SHORT>): Pass failure memorder to
gen_atomic_cas_value_cmp_and_7_si.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:44 +0000 (11:46 +0800)]
LoongArch: Allow using bstrins for masking the address in atomic_test_and_set
We can use bstrins for masking the address here. As people are already
working on LA32R (which lacks bstrins instructions), for future-proofing
we check whether (const_int -4) is an and_operand and force it into an
register if not.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_and_set): Use bstrins
for masking the address if possible.
Xi Ruoyao [Sat, 1 Mar 2025 03:46:43 +0000 (11:46 +0800)]
LoongArch: Don't use "+" for atomic_{load, store} "m" constraint
Atomic load does not modify the memory. Atomic store does not read the
memory, thus we can use "=" instead.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_load<mode>): Remove "+" for
the memory operand.
(atomic_store<mode>): Use "=" instead of "+" for the memory
operand.
Austin Law [Sun, 17 Aug 2025 15:03:51 +0000 (09:03 -0600)]
[PR target/121213] Avoid unnecessary constant load in amoswap
PR 121213 shows an unnecessary "li target,0" in an atomic exchange loop
on RISC-V.
The source operand for an amoswap instruction should allow (const_int 0)
in addition to GPRs. So the operand's predicate is changed to
"reg_or_0_operand". The corresponding constraint is also changed to
allow a reg or the constant 0.
With the source operand no longer tied to the destination operand we do
not need the earlyclobber for the destination, so the destination
operand's constraint is adjusted accordingly.
This patch does not address the unnecessary sign extension reported in
the PR.
Tested with no regressions on riscv32-elf and riscv64-elf.
PR target/121213
gcc/
* config/riscv/sync.md (amo_atomic_exchange<mode>): Allow
(const_int 0) as input operand. Do not tie input to output.
No longer earlyclobber the output.
gcc/testsuite
* gcc.target/riscv/amo/pr121213.c: New test.
Matthew Fortune [Sun, 17 Aug 2025 14:47:02 +0000 (08:47 -0600)]
Testsuite: Use HAS_LDC instead of a specific ISA
The call-clobbered-1.c test has both reasons to be above a certain
ISA and below a certain ISA level. The option based ISA min/max
code only triggers if there is no isa level request.
gcc/testsuite/
* gcc.target/mips/call-clobbered-1.c: Use HAS_LDC ghost
option instead of isa>=2.
Matthew Fortune [Sun, 17 Aug 2025 14:26:12 +0000 (08:26 -0600)]
Testsuite: Fix insn-*.c tests from trunk
Ensure micromips test does not get confused about library support.
Ensure insn-casesi.c and insn-tablejump.c can be executed.
Move the micromips/mips16 selection into the file as per function
attributes so that there is no requirement on having a full
micromips or mips16 runtime to execute the test.
gcc/testsuite/
* gcc.target/mips/insn-casesi.c: Require mips16 support but
not the command line option.
* gcc.target/mips/insn-tablejump.c: Force o32 ABI as
we do not really support n32/n64 microMIPS. Require micromips
support but not the command line option.
Artemiy Volkov [Sun, 17 Aug 2025 14:08:29 +0000 (08:08 -0600)]
regrename: treat writes as reads for fused instruction pairs
Consider the following (RISC-V) instruction pair:
mul s6,a1,a2
add s6,a4,s6
Without this patch, while handling the second instruction, (a) the
existing chain for s6 will first be closed (upon the terminate_write
action for the input operand), then (b) a new one will be opened (upon
the mark_write action for the output operand). This will likely lead to
the two output operands being different physical registers, breaking the
single-output property required for some macro-op fusion pairs.
This patch, using the single_output_fused_pair_p () predicate introduced
earlier, changes the regrename behavior for such pairs to append the
input and the output operands to the existing chain (as if both actions
were mark_read), instead of breaking the current renaming chain and
starting a new one. This ensures that the output operands of both fused
instructions are kept in the same hard register, and that the
single-output property of the insn pair is preserved.
Jan Dubiec [Sun, 17 Aug 2025 14:03:33 +0000 (08:03 -0600)]
[PR target/109324] H8/300: Fix genrecog warnings about operands missing modes.
This patch fixes genrecog warnings about operands missing modes. This is
done by explicitly specifying modes of operations.
PR target/109324
gcc/ChangeLog:
* config/h8300/addsub.md: Explicitly specify mode for plus operation.
* config/h8300/jumpcall.md: Explicitly specify modes for eq and
match_operand operations.
* config/h8300/testcompare.md: Explicitly specify modes for eq, ltu
and compare operations.
The contrib/check-MAINTAINERS.py script sorts by surname, name, bugzilla
handle and email (in this order). Document this. Switch around Andrew
Pinski's entries in Contributing under DCO.
Pushing as obvious.
ChangeLog:
* MAINTAINERS: Switch around Andrew Pinski's entries in
Contributing under DCO.
contrib/ChangeLog:
* check-MAINTAINERS.py: Document the way the script sorts
entries.
Artemiy Volkov [Sun, 17 Aug 2025 02:40:28 +0000 (20:40 -0600)]
ira: tie output allocnos for fused instruction pairs
Some of the instruction pairs recognized as fusible by a preceding
invocation of the dep_fusion pass require that both components of a pair
have the same hard register output for the fusion to work in hardware.
(An example of this would be a multiply-add operation, or a zero-extract
operation composed of two shifts.)
For all such pairs, the following conditions will hold:
(a) Both insns are single_sets
(b) Both insns have a register destination
(c) The pair has been marked as fusible by setting the second insn's
SCHED_GROUP flag
(d) Additionally, post-RA, both instructions' destination regnos are
equal
(All of these conditions are encapsulated in the newly created
single_output_fused_pair_p () predicate.)
During IRA, if conditions (a)-(c) above hold, we need to tie the two
instructions' destination allocnos together so that they are allocated
to the same hard register. We do this in add_insn_allocno_copies () by
adding a constraint conflict to the output operands of the two
instructions.
gcc/ChangeLog:
* ira-conflicts.cc (add_insn_allocno_copies): Handle fused insn pairs.
* rtl.h (single_output_fused_pair_p): Declare new function.
* rtlanal.cc (single_output_fused_pair_p): Define it.
Dimitar Dimitrov [Sun, 17 Aug 2025 02:30:14 +0000 (20:30 -0600)]
[PATCH] RISC-V: Fix block matching in arch-canonicalize [PR121538]
Commit r16-3028-g0c517ddf9b136c introduced parsing of conditional blocks
in riscv-ext*.def. For simplicity, it used a simple regular expression
to match the C++ lambda function for each condition. But the regular
expression is too simple - it matches only the first scoped code block,
without any trailing closing braces.
The "c" dependency for the "zca" extension has two code blocks inside
its conditional. One for RV32 and one for RV64. The script matches
only the RV32 block, and leaves the RV64 one. Any strings left, in turn,
are considered a list of non-conditional extensions. Thus the quoted
strings "d" and "zcd" from that block are taken as "simple" (non-conditional)
dependencies:
if (subset_list->xlen () == 64)
{
if (subset_list->lookup ("d"))
return subset_list->lookup ("zcd");
As a result, arch-canonicalize erroneously adds "d" extension:
$ ./config/riscv/arch-canonicalize rv32ec
rv32efdc_zicsr_zca_zcd_zcf
Before r16-3028-g0c517ddf9b136c the command returned:
$ ./config/riscv/arch-canonicalize rv32ec
rv32ec
Fix by extending the conditional block match until the number of opening
and closing braces is equal. This change might seem crude, but it does
save us from introducing a full C++ parser into the simple
arch-canonicalize python script. With this patch the script now
returns:
H.J. Lu [Fri, 15 Aug 2025 02:04:33 +0000 (19:04 -0700)]
x86: Add target("80387") function attribute
Add target("80387") attribute to enable and disable x87 instructions in a
function.
gcc/
PR target/121541
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p): Add target("80387")
attribute. Set the mask bit in opts_set->x_target_flags if the
mask bit in opts->x_target_flags is updated.
* doc/extend.texi: Document target("80387") function attribute.
Nathaniel Shead [Fri, 8 Aug 2025 13:23:18 +0000 (23:23 +1000)]
c++: Implement P2115R0 linkage changes for unnamed unscoped enums [PR120503]
We currently list P2115R0 as implemented, but only the modules changes
had been done. This patch implements the linkage changes so that
unnamed unscoped enums will use the name of the first enumerator for
linkage purposes.
This is (strictly speaking) a breaking change, as code that previously
relied on unnamed enumerations being internal linkage may have overloads
using those types become exposed and clash with other functions in a
different TU that have been similarly exposed. As such this feature is
only implemented for C++20.
No ABI flag warning is provided, partly because C++20 is still an
experimental standard, but also because any affected functions could not
have been part of an ABI until this change anyway.
A number of testcases that are testing for behaviour of no-linkage types
are adjusted to use an enumeration with no values, so that the pre-C++20
and post-C++20 behaviour is equivalently tested.
In terms of implementation, I had originally considered adjusting the
DECL_NAME of the enum, as with 'name_unnamed_type', but this ended up
being more complicated as it had unwanted interactions with the existing
modules streaming and with name lookup and diagnostic messages. This
patch instead uses a new function to derive this case.
The standard says that ([dcl.enum] p11) such an enum "...is denoted, for
linkage purposes, by its underlying type and its first enumerator", so
we need to add a new mangling production as well to handle this.
PR c++/120503
PR c++/120824
gcc/cp/ChangeLog:
* cp-tree.h (TYPE_UNNAMED_P): Adjust for enums with enumerators
for linkage purposes.
(enum_with_enumerator_for_linkage_p): Declare.
* decl.cc (name_unnamed_type): Adjust assertions to handle enums
with enumerators for linkage purposes.
(grokdeclarator): Use a typedef name for enums with enumerators
for linkage purposes.
(enum_with_enumerator_for_linkage_p): New function.
(finish_enum_value_list): Reset type linkage for enums with
enumerators for linkage purposes.
* mangle.cc (write_unnamed_enum_name): New function.
(write_unqualified_name): Handle enums with enumerators for
linkage purposes.
* tree.cc (decl_linkage): Fixup unnamed enums.
gcc/testsuite/ChangeLog:
* g++.dg/abi/mangle32.C: Remove enumerator list.
* g++.dg/cpp0x/linkage2.C: Likewise.
* g++.dg/ext/vector26.C: Likewise.
* g++.dg/other/anon3.C: Likewise.
* g++.dg/abi/mangle83.C: New test.
* g++.dg/modules/enum-15_a.C: New test.
* g++.dg/modules/enum-15_b.C: New test.
clang++ apparently added a SFINAE-friendly __builtin_structured_binding_size
trait to return the structured binding size (or error if not in SFINAE
contexts if a type doesn't have a structured binding size).
The expansion statement patch already anticipated this through adding
complain argument to cp_finish_decomp.
The following patch implements it.
2025-08-15 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/extend.texi (Type Traits): Document
__builtin_structured_binding_size.
gcc/cp/
* cp-trait.def (STRUCTURED_BINDING_SIZE): New unary trait.
* cp-tree.h (finish_structured_binding_size): Declare.
* semantics.cc (trait_expr_value): Handle
CPTK_STRUCTURED_BINDING_SIZE.
(finish_structured_binding_size): New function.
(finish_trait_expr): Handle CPTK_RANK and CPTK_TYPE_ORDER
in the switch instead of just doing break; for those and
ifs at the end to handle them. Handle CPTK_STRUCTURED_BINDING_SIZE.
* pt.cc (tsubst_expr): Likewise.
* constraint.cc (diagnose_trait_expr): Likewise.
* decl.cc (get_tuple_size): Use mce_true for maybe_const_value.
(cp_decomp_size): Diagnose incomplete types not just if
processing_template_decl, and use error_at instead of pedwarn.
If btype is NULL, just return 0 instead of diagnosing an error.
gcc/testsuite/
* g++.dg/cpp26/expansion-stmt15.C: Expect different diagnostics
for zero size destructuring expansion statement.
* g++.dg/ext/builtin-structured-binding-size1.C: New test.
* g++.dg/ext/builtin-structured-binding-size2.C: New test.
* g++.dg/ext/builtin-structured-binding-size3.C: New test.
* g++.dg/ext/builtin-structured-binding-size4.C: New test.
Jakub Jelinek [Fri, 15 Aug 2025 20:37:42 +0000 (22:37 +0200)]
c++: Add testcases for the defarg part of P1766R1 [PR121552]
The following patch adds some testcases for the default argument (function
and template) part of the paper, making sure we diagnose multiple defargs
in the same TU and when visible in modules and DTRT when some aren't visible
and some are visible and they are equal. Not testing when they are
different since that is IFNDR.
2025-08-15 Jakub Jelinek <jakub@redhat.com>
PR c++/121552
* g++.dg/parse/defarg21.C: New test.
* g++.dg/template/defarg24.C: New test.
* g++.dg/modules/default-arg-4_a.C: New test.
* g++.dg/modules/default-arg-4_b.C: New test.
* g++.dg/modules/default-arg-5_a.C: New test.
* g++.dg/modules/default-arg-5_b.C: New test.
Jakub Jelinek [Fri, 15 Aug 2025 20:36:18 +0000 (22:36 +0200)]
c++: Implement C++20 P1766R1 - Mitigating minor modules maladies [PR121552]
The following patch attempts to implement the
C++20 P1766R1 - Mitigating minor modules maladies
paper.
clang++ a few years ago introduced for the diagnostics required in
the paper -Wnon-c-typedef-for-linkage pedwarn and the following patch
does that too.
The paper was accepted as a DR, the patch enables the warning
also for C++98, dunno whether it might not be better to do it only
for C++11 onwards.
The paper is also about differences in default arguments of functions
in different TUs and in modules, I think within the same TU we diagnose
it correctly (maybe I should add some testcase) and perhaps try
something with modules as well. But in different TUs it is IFNDR.
2025-08-15 Jakub Jelinek <jakub@redhat.com>
PR c++/121552
gcc/
* doc/invoke.texi (-Wno-non-c-typedef-for-linkage): Document.
gcc/c-family/
* c.opt (Wnon-c-typedef-for-linkage): New option.
* c.opt.urls: Regenerate.
gcc/cp/
* decl.cc: Implement C++20 P1766R1 - Mitigating minor modules maladies.
(diagnose_non_c_class_typedef_for_linkage,
maybe_diagnose_non_c_class_typedef_for_linkage): New functions.
(name_unnamed_type): Call
maybe_diagnose_non_c_class_typedef_for_linkage.
gcc/testsuite/
* g++.dg/cpp2a/typedef1.C: New test.
* g++.dg/debug/dwarf2/typedef5.C: Add -Wno-non-c-typedef-for-linkage
to dg-options.
* g++.dg/inherit/typeinfo1.C: Add -Wno-non-c-typedef-for-linkage
to dg-additional-options.
* g++.dg/parse/ctor2.C: Likewise.
* g++.dg/ext/anon-struct9.C: Add -Wno-non-c-typedef-for-linkage to
dg-options.
* g++.dg/ext/visibility/anon11.C: Add -Wno-non-c-typedef-for-linkage
to dg-additional-options.
* g++.dg/lto/pr69137_0.C: Add -Wno-non-c-typedef-for-linkage
to dg-lto-options.
* g++.dg/other/anon8.C: Add -Wno-non-c-typedef-for-linkage
to dg-additional-options.
* g++.dg/template/pr84973.C: Likewise.
* g++.dg/template/pr84973-2.C: Likewise.
* g++.dg/template/pr84973-3.C: Likewise.
* g++.dg/abi/anon2.C: Likewise.
* g++.dg/abi/anon3.C: Likewise.
* g++.old-deja/g++.oliva/linkage1.C: Likewise.
Jakub Jelinek [Fri, 15 Aug 2025 20:34:59 +0000 (22:34 +0200)]
c++: Fix default argument parsing in non-comma variadic methods [PR121539]
While the non-comma variadic functions/methods were deprecated in C++26,
they are still valid and they are valid without deprecation in C++98 to
C++23.
We parse default arguments followed by ...) outside of classes or
for out of class definitions of methods, but I think since C++11 support
in GCC 4.9 or so we consider ... to be a part of a default argument and
error on it.
I think a default argument can't validly contain a pack expansion
that ends the expression with ..., so I think we can simply handle
...) if at depth 0 as not part of the default argument.
2025-08-15 Jakub Jelinek <jakub@redhat.com>
PR c++/121539
* parser.cc (cp_parser_cache_defarg): Set done to true for
CPP_ELLIPSIS followed by CPP_CLOSE_PAREN in !nsdmi at depth 0.
Jakub Jelinek [Fri, 15 Aug 2025 20:31:27 +0000 (22:31 +0200)]
c++: Warn on #undef/#define of remaining cpp.predefined macros [PR120778]
We already warn on #undef or pedwarn on #define (but not on #define
after #undef) of some builtin macros mentioned in cpp.predefined.
The C++26 P2843R3 paper changes it from (compile time) undefined behavior
to ill-formed. The following patch arranges for warning (for #undef)
and pedwarn (on #define) for the remaining cpp.predefined macros.
__cpp_* feature test macros only for C++20 which added some of them
to cpp.predefined, in earlier C++ versions it was just an extension and
for pedantic diagnostic I think we don't need to diagnose anything,
__STDCPP_* and __cplusplus macros for all C++ versions where they appeared.
Like the earlier posted -Wkeyword-macro diagnostics (which is done
regardless whether the identifier is defined as a macro or not, obviously
most likely none of the keywords are defined as macros initially), this
one also warns on #undef when a macro isn't defined or later #define
after #undef.
2025-08-15 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/120778
PR target/121520
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Implement C++26 DR 2581. Add
cpp_define_warn lambda and use it as well as cpp_warn where needed.
In the if (c_dialect_cxx ()) block with __cpp_* predefinitions add
cpp_define lambda. Formatting fixes.
gcc/c/
* c-decl.cc (c_init_decl_processing): Use cpp_warn instead of
cpp_lookup and NODE_WARN bit setting.
gcc/cp/
* lex.cc (cxx_init): Remove warn_on lambda. Use cpp_warn instead of
cpp_lookup and NODE_WARN bit setting or warn_on.
gcc/testsuite/
* g++.dg/DRs/dr2581-1.C: New test.
* g++.dg/DRs/dr2581-2.C: New test.
* c-c++-common/cpp/pr92296-2.c: Expect warnings also on defining
special macros after undefining them.
libcpp/
* include/cpplib.h (struct cpp_options): Add
suppress_builtin_macro_warnings member.
(cpp_warn): New inline functions.
* init.cc (cpp_create_reader): Clear suppress_builtin_macro_warnings.
(cpp_init_builtins): Call cpp_warn on __cplusplus, __STDC__,
__STDC_VERSION__, __STDC_MB_MIGHT_NEQ_WC__ and
__STDCPP_STRICT_POINTER_SAFETY__ when appropriate.
* directives.cc (do_undef): Warn on undefining NODE_WARN macros if
not cpp_keyword_p. Don't emit any NODE_WARN related diagnostics
if CPP_OPTION (pfile, suppress_builtin_macro_warnings).
(cpp_define, _cpp_define_builtin, cpp_undef): Temporarily set
CPP_OPTION (pfile, suppress_builtin_macro_warnings) around
run_directive calls.
* macro.cc (_cpp_create_definition): Warn on defining NODE_WARN
macros if they weren't previously defined and not cpp_keyword_p.
Ignore NODE_WARN for diagnostics if
CPP_OPTION (pfile, suppress_builtin_macro_warnings).
Robert Dubner [Thu, 31 Jul 2025 11:45:26 +0000 (07:45 -0400)]
real: Eliminate access to uninitialized memory.
When compiling this program with gcobol:
identification division.
program-id. prog.
data division.
working-storage section.
01 val pic v9(5) value .001.
procedure division.
display val
goback.
the rounding up of .99999...9999 to 1.000...0000 causes a read of the
first byte of the output buffer. Although harmless, it generates a
valgrind warning. The following change clears that warning.
gcc/ChangeLog:
* real.cc (real_to_decimal_for_mode): Set str[0] to known value.
__builtin_round() fails to save/restore FP exception flags around the FP
compare insn which can potentially clobber the same.
Worth noting that the fflags restore bracketing is slightly different
than the glibc implementation. Both FLT and FCVT can potentially clobber
fflags. gcc generates below where even if branch is not taken and FCVT
is not executed, FLT still executed. Thus FSFLAGS is placed AFTER the
label 'L3'. glibc implementation FLT can't clobber due to early NaN check,
so FSFLAGS can be moved under the branch, before the label.
Complement to the previous commit in fixincludes
(b1f9ab40cbcc6ecd53a2be3e01052cee096e1a00), for the MacOSX12.3 SDK, it
is necessary to also bypass the stdio_va_list fix. The same bypass is
used, namely, the inclusion of <_stdio.h>.
fixincludes/ChangeLog:
* fixincl.x: Regenerate.
* inclhack.def (stdio_va_list): Skip on recent darwin.
Qing Zhao [Thu, 14 Aug 2025 20:27:20 +0000 (20:27 +0000)]
Generate a call to a .ACCESS_WITH_SIZE for a FAM with counted_by attribute only when it's read from.
Currently, we generate a call to a .ACCESS_WITH_SIZE for a FAM with counted_by
attribute for every component_ref that corresponds to such an object.
Actually, such .ACCESS_WITH_SIZE calls are useless when they are generated
for a written site or an address taken site.
In this patch, we only generate a call to .ACCESS_WITH_SIZE for a FAM with
counted_by attribute when it's a read.
gcc/c/ChangeLog:
* c-tree.h (handle_counted_by_for_component_ref): New prototype of
build_component_ref and handle_counted_by_for_component_ref.
* c-parser.cc (c_parser_postfix_expression): Call the new prototypes
of build_component_ref and handle_counted_by_for_component_ref,
update comments.
* c-typeck.cc (default_function_array_read_conversion): Likewise.
(convert_lvalue_to_rvalue): Likewise.
(default_conversion): Likewise.
(handle_counted_by_p): Update comments.
(handle_counted_by_for_component_ref): Delete one argument.
(build_component_ref): Delete one argument. Delete the call to
handle_counted_by_for_component_ref completely.
(build_array_ref): Generate call to .ACCESS_WITH_SIZE for array.
Qing Zhao [Thu, 14 Aug 2025 20:25:55 +0000 (20:25 +0000)]
Use the counted_by attribute of pointers in array bound checker.
Current array bound checker only instruments ARRAY_REF, and the INDEX
information is the 2nd operand of the ARRAY_REF.
When extending the array bound checker to pointer references with
counted_by attributes, the hardest part is to get the INDEX of the
corresponding array ref from the offset computation expression of
the pointer ref. I.e.
Given an OFFSET expression, and the ELEMENT_SIZE,
get the index expression from the OFFSET.
For example:
OFFSET:
((long unsigned int) m * (long unsigned int) SAVE_EXPR <n>) * 4
ELEMENT_SIZE:
(sizetype) SAVE_EXPR <n> * 4
get the index as (long unsigned int) m.
gcc/c-family/ChangeLog:
* c-gimplify.cc (is_address_with_access_with_size): New function.
(ubsan_walk_array_refs_r): Instrument an INDIRECT_REF whose base
address is .ACCESS_WITH_SIZE or an address computation whose base
address is .ACCESS_WITH_SIZE.
* c-ubsan.cc (ubsan_instrument_bounds_pointer_address): New function.
(struct factor_t): New structure.
(get_factors_from_mul_expr): New function.
(get_index_from_offset): New function.
(get_index_from_pointer_addr_expr): New function.
(is_instrumentable_pointer_array_address): New function.
(ubsan_array_ref_instrumented_p): Change prototype.
Handle MEM_REF in addtional to ARRAY_REF.
(ubsan_maybe_instrument_array_ref): Handle MEM_REF in addtional
to ARRAY_REF.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-5.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds.c: New test.
Qing Zhao [Thu, 14 Aug 2025 20:24:18 +0000 (20:24 +0000)]
Use the counted_by attribute of pointers in builtinin-object-size. No need to change anything in middle-end. Add the testing case for PR120929.
gcc/testsuite/ChangeLog:
* gcc.dg/pointer-counted-by-4-char.c: New test.
* gcc.dg/pointer-counted-by-4-float.c: New test.
* gcc.dg/pointer-counted-by-4-struct.c: New test.
* gcc.dg/pointer-counted-by-4-union.c: New test.
* gcc.dg/pointer-counted-by-4.c: New test.
* gcc.dg/pointer-counted-by-5.c: New test.
* gcc.dg/pointer-counted-by-6.c: New test.
* gcc.dg/pointer-counted-by-7.c: New test.
* gcc.dg/pr120929.c: New test.
Qing Zhao [Thu, 14 Aug 2025 20:22:20 +0000 (20:22 +0000)]
Extend "counted_by" attribute to pointer fields of structures. Convert a pointer reference with counted_by attribute to .ACCESS_WITH_SIZE. Fix PR120929.
specifies that the "array2" is an array that is pointed by the
pointer field, and its number of elements is given by the field
"count2" in the same structure.
In order to fix PR120929, we agreed on the following solution:
for a pointer field with counted_by attribute:
struct S {
int n;
int *p __attribute__((counted_by(n)));
} *f;
when generating call to .ACCESS_WITH_SIZE for f->p, instead of generating
*.ACCESS_WITH_SIZE (&f->p, &f->n,...)
We should generate
.ACCESS_WITH_SIZE (f->p, &f->n,...)
i.e.,
the return type and the type of the first argument of the call is the
original pointer type in this version.
However, this code generation might bring undefined behavior into the
applicaiton if the call to .ACCESS_WITH_SIZE is generated for a pointer
field reference when this refernece is written to.
In the above, in order to generate a call to .ACCESS_WITH_SIZE for the pointer
reference f->p, the new GIMPLE tmp1 = f->p is necessary to pass the value of
the pointer f->p to the call to .ACCESS_WITH_SIZE. However, this new GIMPLE is
the one that brings UB into the application since the value of f->p is not
initialized yet when it is assigned to "tmp1".
the above IL will be expanded to the following when .ACCESS_WITH_SIZE is
expanded to its first argument:
As a result, the f->p will NOT be set correctly to the pointer
returned by malloc (size).
Due to this potential issue, We will need to selectively generate the call to
.ACCESS_WITH_SIZE for f->p according to whether it's a read or a write.
We will only generate call to .ACCESS_WITH_SIZE for f->p when it's a read in
C FE.
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_counted_by_attribute): Accept counted_by
attribute for pointer fields.
gcc/c/ChangeLog:
* c-decl.cc (verify_counted_by_attribute): Change the 2nd argument
to a vector of fields with counted_by attribute. Verify all fields
in this vector.
(finish_struct): Collect all the fields with counted_by attribute
to a vector and pass this vector to verify_counted_by_attribute.
* c-tree.h (handle_counted_by_for_component_ref): New prototype of
handle_counted_by_form_component_ref.
* c-parser.cc (c_parser_postfix_expression): Call the new prototype
of handle_counted_by_for_component_ref.
* c-typeck.cc (default_function_array_read_conversion): Only generate
call to .ACCESS_WITH_SIZE for a pointer field when it's a read.
(convert_lvalue_to_rvalue): Likewise.
(default_conversion): Likewise.
(handle_counted_by_p): New routine.
(check_counted_by_attribute): New routine.
(build_counted_by_ref): Handle pointers with counted_by.
(build_access_with_size_for_counted_by): Handle pointers with counted_by.
(handle_counted_by_for_component_ref): Add one more argument.
(build_component_ref): Call the new prototype of
handle_counted_by_for_component_ref.
gcc/ChangeLog:
* doc/extend.texi: Extend counted_by attribute to pointer fields in
structures. Add one more requirement to pointers with counted_by
attribute.
gcc/testsuite/ChangeLog:
* gcc.dg/flex-array-counted-by.c: Update test.
* gcc.dg/pointer-counted-by-1.c: New test.
* gcc.dg/pointer-counted-by-2.c: New test.
* gcc.dg/pointer-counted-by-3.c: New test.
* gcc.dg/pointer-counted-by-8.c: New test.
* gcc.dg/pointer-counted-by-9.c: New test.
* gcc.dg/pointer-counted-by.c: New test.
Umesh Kalappa [Fri, 15 Aug 2025 13:35:40 +0000 (07:35 -0600)]
RISC-V: MIPS prefetch extensions for MIPS RV64 P8700 and can be enabled with xmipscbop.
Addressed the comments and tested "runtest --tool gcc --target_board='riscv-sim/-march=rv64gc_zba_zbb_zbc_zbs/-mabi=lp64/-mcmodel=medlow' riscv.exp" and 32 bit too
lint warnings can be ignored for riscv-ext.opt.
gcc/ChangeLog:
* config/riscv/riscv-ext-mips.def (DEFINE_RISCV_EXT):
Added mips prefetch extension.
* config/riscv/riscv-ext.opt: Generated file.
* config/riscv/riscv.md (prefetch):
Added mips prefetch address operand constraint.
* config/riscv/constraints.md: Added mips specific constraint.
* config/riscv/predicates.md (prefetch_operand):
Updated for mips nine bits offset.
* config/riscv/riscv.cc (riscv_prefetch_offset_address_p):
Legitimate address with offset for prefetch check.
* config/riscv/riscv-protos.h: Likewise.
* config/riscv/riscv.h:
Macros to support for mips cached type.
* doc/riscv-ext.texi: Updated for mips prefetch.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mipsprefetch.c: Test file for mips.pref.
RISC-V: Allow errors to be suppressed when parsing architectures
One of Alfie's FMV patches adds a hook that, in some cases,
is used to silently query a target_version (with no diagnostics
expected). In the review, I'd suggested handling this using
a location_t *, with null meaning "suppress diagnostics":
This patch tries to propagate that through the RISC-V parsing code.
I realise this isn't very elegant, sorry.
I think riscv_compare_version_priority should also logically suppress
diagnostics, since it's supposed to be a pure query function. (From
that point of view, advocating for this change for Alfie's patch might
have been a bit unfair.)
gcc/
* config/riscv/riscv-protos.h
(riscv_process_target_version_attr): Change location_t argument
to location_t *.
* config/riscv/riscv-subset.h
(riscv_subset_list::riscv_subset_list): Change location_t argument
to location_t *.
(riscv_subset_list::parse): Likwise.
(riscv_subset_list::set_loc): Likewise.
(riscv_minimal_hwprobe_feature_bits): Likewise.
(riscv_subset_list::m_loc): Change type to location_t.
* common/config/riscv/riscv-common.cc
(riscv_subset_list::riscv_subset_list): Change location_t argument
to location_t *.
(riscv_subset_list::add): Suppress diagnostics when m_loc is null.
(riscv_subset_list::parsing_subset_version): Likewise.
(riscv_subset_list::parse_profiles): Likewise.
(riscv_subset_list::parse_base_ext): Likewise.
(riscv_subset_list::parse_single_std_ext): Likewise.
(riscv_subset_list::check_conflict_ext): Likewise.
(riscv_subset_list::parse_single_multiletter_ext): Likewise.
(riscv_subset_list::parse): Change location_t argument to location_t *.
(riscv_subset_list::set_loc): Likewise.
(riscv_minimal_hwprobe_feature_bits): Likewise.
(riscv_parse_arch_string): Update call accordingly.
* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::m_loc): Change type to location_t *.
(riscv_target_attr_parser::riscv_target_attr_parser): Change
location_t argument to location_t *.
(riscv_process_one_target_attr): Likewise.
(riscv_process_target_attr): Likewise.
(riscv_process_target_version_attr): Likewise.
(riscv_target_attr_parser::parse_arch): Suppress diagnostics when
m_loc is null.
(riscv_target_attr_parser::handle_arch): Likewise.
(riscv_target_attr_parser::handle_cpu): Likewise.
(riscv_target_attr_parser::handle_tune): Likewise.
(riscv_target_attr_parser::handle_priority): Likewise.
(riscv_option_valid_attribute_p): Update call accordingly.
(riscv_option_valid_version_attribute_p): Likewise.
* config/riscv/riscv.cc (parse_features_for_version): Add a
location_t * argument.
(dispatch_function_versions): Update call accordingly.
(riscv_compare_version_priority): Likewise, suppressing diagnostics.
All macOS SDK since at least macOS 10.9, and until macOS 10.12
(included), feature these lines in <stdio.h>:
/* DO NOT REMOVE THIS COMMENT: fixincludes needs to see:
* __gnuc_va_list and include <stdarg.h> */
The clear intent (and effect) was to bypass gcc’s stdio_stdarg_h
fixinclude.
However, since macOS 10.13, these lines have been moved to <_stdio.h>,
which is itself included at the top of <stdio.h>. The unintended
consequence is that the stdio_stdarg_h fixinclude is now applied to
macOS <stdio.h>, where it is not needed. This useless fixinclude makes
the compiler more fragile and less portable.
A previous attempt to skip the stdio_stdarg_h fix entirely had to be
reverted, since it broken some very old macOS versions. The new fix is
to bypass the fix based on the detection of <_stdio.h> inclusion, which
is more robust.
fixincludes/ChangeLog:
* fixincl.x: Regenerate.
* inclhack.def (stdio_stdarg_h): Skip on darwin.
Jakub Jelinek [Thu, 14 Aug 2025 20:30:45 +0000 (22:30 +0200)]
c++: Fix up build_cplus_array_type [PR121524]
The following testcase is miscompiled since my r15-3046 change
to properly apply std attributes after closing ] for arrays to the
array type.
Array type is not a class type, so when cplus_decl_attribute is
called on the ARRAY_TYPE, it doesn't do ATTR_FLAG_TYPE_IN_PLACE.
Though, for alignas/gnu::aligned/deprecated/gnu::unavailable/gnu::unused
attributes the handlers of those attributes for non-ATTR_FLAG_TYPE_IN_PLACE
on types call build_variant_type_copy and modify some flags on the new
variant type. They also usually don't clear *no_add_attrs, so the caller
then checks if the attributes are present on the new type and if not, calls
build_type_attribute_variant.
On the following testcase, it results in the B::foo type to be properly
32 byte aligned.
The problem happens later when we build_cplus_array_type for C::a.
elt_type is T (typedef, or using works likewise), we get as m
main variant type with unsigned int element type but because elt_type
is different, build_cplus_array_type searches the TYPE_NEXT_VARIANT chain
to find if there isn't already a useful ARRAY_TYPE to reuse.
It checks for NULL TYPE_NAME, NULL TYPE_ATTRIBUTES and the right TREE_TYPE.
Unfortunately this is not good enough, build_variant_type_copy above created
a variant type on which it modified TYPE_USER_ALIGN and TYPE_ALIGN, but
TYPE_ATTRIBUTES is still NULL, only the build_type_attribute_variant call
later adds attributes.
The problem is that the intermediate type is found in the TYPE_NEXT_VARIANT
chain and reused.
The following patch adds conditions to prevent problems with the affected
attributes (except gnu::unused, I think whether TREE_USED is set or not
shouldn't prevent sharing). In particular, if TYPE_USER_ALIGN is not
set on the variant, it wasn't user realigned, if it is set, it verifies
it has it set because the elt_type has been user aligned and TYPE_ALIGN
is the expected one. For deprecated it punts on the flag being set and
for gnu::unavailable as well.
2025-08-14 Jakub Jelinek <jakub@redhat.com>
PR c++/121524
* tree.cc (build_cplus_array_type): Don't reuse variant type
if it has TREE_DEPRECATED or TREE_UNAVAILABLE flags set or,
unless elt_type has TYPE_USER_ALIGN set and TYPE_ALIGN is
TYPE_ALIGN of elt_type, TYPE_USER_ALIGN is not set.
Jeff Law [Thu, 14 Aug 2025 20:15:40 +0000 (14:15 -0600)]
[PR target/119275][RISC-V] Avoid calling gen_lowpart in cases where it would ICE
So this is a minor bug in the riscv move expanders. It has a special cases for
extraction from vector objects which makes assumptions that it can use
gen_lowpart unconditionally. That's not always the case.
We can just bypass that special code for cases where we can't use gen_lowpart
and let the more generic code run. If gen_lowpart_common indicates we've got a
case that can't be handled we just bypass the special extraction code.
Tested on riscv64-elf and riscv32-elf. Waiting for pre-commit CI to do its
thing.
PR target/119275
gcc/
* config/riscv/riscv.cc (riscv_legitimize_move): Avoid calling
gen_lowpart for cases where it'll fail. Just use standard expander
paths for those cases.
gcc/testsuite/
* gcc.target/riscv/pr119275.c: New test.
Since the cris port was added to gcc it has passed --em=criself
to gas, as an abbreviation for --emulation=criself. Starting with
binutils-2.45 that causes a hard error in gas due to ambiguity with
another option.
Fixed by replacing the abbreviation with the complete option.
Tested by building a cross to cris-elf with binutils-2.45, which
failed before but now succeeds.
gcc/
PR target/121336
* config/cris/cris.h: Do not abbreviate --emulation.
Signed-off-by: Mikael Pettersson <mikpelinux@gmail.com>
powerpc: Add missing modes to P9 if_then_elses [PR121501]
These patterns had one (if_then_else ...) nested within another.
The outer if_then_else had SImode, which means that the "then"
and "else" should also be SImode (unless they're const_ints).
However, the inner if_then_else was modeless, which led to an
assertion failure when trying to take a subreg of it.
Andrew Pinski [Wed, 13 Aug 2025 06:31:15 +0000 (23:31 -0700)]
forwprop: Limit alias walk in some cases [PR121474]
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692091.html
pointed out:
'''
Oh, as we now do alias walks in forwprop maybe we should make this
conditional and do
this not for all pass instances, since it makes forwprop possibly a lot slower?
'''
This does patch limits the walk in a few different ways.
First only allow for a full walk in the first 2 forwprop (the one before inlining
and the one after inlining). The other 2 forwprop are less likely to find any extra
zero prop so limit them so there is no walk.
There is an exception to the rule though, allowing to skip over clobbers still since those
will not take a long time for the walk and from when looking at benchmarks the only place
where forwprop3/4 would cause a zero prop.
The other thing is limit a full walk only if flag_expensive_optimizations is true.
This limits the walk for -O1 since flag_expensive_optimizations is turned on at -O2+.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/121474
gcc/ChangeLog:
* passes.def: Update forwprop1/2 to have full_walk to be true.
* tree-ssa-forwprop.cc (optimize_aggr_zeroprop): Add new argument
full_walk. Take into account the full_walk and clobbers at the end
of the limit can be done always.
(simplify_builtin_call): Add new argument, full_walk.
Update call to optimize_aggr_zeroprop.
(pass_forwprop): Add m_full_walk field.
(pass_forwprop::set_pass_param): Update for m_full_walk.
(pass_forwprop::execute): Update call to simplify_builtin_call
and optimize_aggr_zeroprop.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Andrew Pinski [Tue, 10 Jun 2025 07:11:58 +0000 (00:11 -0700)]
forwprop: Copy prop aggregates into args
This implements the simple copy prop of aggregates into
arguments of function calls. This can reduce the number of copies
done. Just like removing of an extra copy in general, this can and
will help out SRA; since we might not need to do a full scalarization
of the aggregate now.
This is the simpliest form of this copy prop of aggregates into function arguments.
Changes since v1:
* v2: Merge in the changes of r16-3160-g2fe432175ef135.
Move the checks for assignment and call statement into optimize_agr_copyprop
rather than having it in optimize_agr_copyprop_1 and optimize_agr_copyprop_arg.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (optimize_agr_copyprop_1): New function split out of ...
(optimize_agr_copyprop): Here. Also try calling optimize_agr_copyprop_arg.
(optimize_agr_copyprop_arg): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/copy-prop-aggregate-arg-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
In commit r16-2316-gc6676092318 mistakenly patterns were introduced
which actually should have been merged as alternatives to existing zero
extend patterns.
While on it, generalize the vec_extract patterns and also allow
registers for the index. A subsequent patch will add
register+immediate support.
gcc/ChangeLog:
* config/s390/s390.md: Merge movdi<mode>_zero_extend_A and
movsi<mode>_zero_extend_A into zero_extendsidi2 and
zero_extendhi<mode>2_z10 and
zero_extend<HQI:mode><GPR:mode>2_extimm.
* config/s390/vector.md (*movdi<mode>_zero_extend_A): Remove.
(*movsi<mode>_zero_extend_A): Remove.
(*movdi<mode>_zero_extend_B): Move to vec_extract patterns and
rename to *vec_extract<mode>_zero_extend.
(*movsi<mode>_zero_extend_B): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vlgv-zero-extend-1.c: Require target
s390_mvx.
* gcc.target/s390/vector/vlgv-zero-extend-2.c: New test.
testsuite: Fix asm-hard-reg-error-3.c for arm [PR121511]
This test is about register pairs. On arm a long long is accepted in
thumb mode in any register 0-6 whereas in arm mode this is restricted to
even register pairs. Thus, in order to trigger the error even if gcc is
configured with --with-mode=thumb, add option -marm.
x86: Add preserve_none and update no_caller_saved_registers attributes
allowed MMX/80387 instructions in functions with no_caller_saved_registers
attribute by accident. Update ix86_set_current_function to properly
check if MMX and 80387 are enabled.
gcc/
PR target/121540
* config/i386/i386-options.cc (ix86_set_current_function):
Properly check if MMX and 80387 are enabled.
Jeff Law [Wed, 13 Aug 2025 23:16:41 +0000 (17:16 -0600)]
[RISC-V][PR target/121531] Cover missing insn types in p400 and p600 scheduler models
So the usual problems, DFAs without full coverage. I took the output of Kito's
checker and use that to construct a dummy reservation for the p400 and p600
sifive models.
Tested on riscv32-elf and riscv64-elf with no regressions.
Pushing to the trunk once pre-commit CI gives the green light.
which currently doesn't compile because the 'j' in the capture isn't
visible in the trailing return type. With these proposals, the 'j'
will be in a lambda scope which spans the trailing return type, so
this test will compile.
This oughtn't be difficult but decltype and other issues made this patch
much more challenging.
We have to push the explicit captures before going into
_lambda_declarator_opt because that is what parses the trailing return
type. Yet we can't build any captures until after _lambda_body ->
start_lambda_function which creates the lambda's operator(), without
which we can't build a proxy, but _lambda_body happens only after
parsing the declarator. This patch works around it by creating a fake
operator() and adding it to the capture and then removing it when we
have the real operator().
Another thing is that in "-> decltype(j)" we don't have the right
current_function_decl yet. If current_lambda_expr gives us a lambda,
we know this decltype appertains to a lambda. But we have to know if we
are in a parameter-declaration-clause: as per [expr.prim.id.unqual]/4.4,
if we are, we shouldn't be adding "const". The new LAMBDA_EXPR_CONST_QUAL_P
flag tracks this. But it doesn't handle nested lambdas yet, specifically,
[expr.prim.id.unqual]/14.
I don't think this patch changes behavior for the tests in
"capture-default with [=]" as the paper promises; clang++ behaves the
same as gcc with this patch.
PR c++/102610
gcc/cp/ChangeLog:
* cp-tree.h (LAMBDA_EXPR_CONST_QUAL_P): Define.
(maybe_add_dummy_lambda_op): Declare.
(remove_dummy_lambda_op): Declare.
(push_capture_proxies): Adjust.
* lambda.cc (build_capture_proxy): No longer static. New early_p
parameter. Use it.
(add_capture): Adjust the call to build_capture_proxy.
(resolvable_dummy_lambda): Check DECL_LAMBDA_FUNCTION_P.
(push_capture_proxies): New.
(start_lambda_function): Use it.
* name-lookup.cc (check_local_shadow): Give an error for
is_capture_proxy.
(cp_binding_level_descriptor): Add lambda-scope.
(begin_scope) <case sk_lambda>: New case.
* name-lookup.h (enum scope_kind): Add sk_lambda.
(struct cp_binding_level): Widen kind.
* parser.cc (cp_parser_lambda_expression): Create a new (lambda) scope
after the lambda-introducer.
(cp_parser_lambda_declarator_opt): Set LAMBDA_EXPR_CONST_QUAL_P.
Create a dummy operator() if needed. Inject the captures into the
lambda scope. Remove the dummy operator().
(make_dummy_lambda_op): New.
(maybe_add_dummy_lambda_op): New.
(remove_dummy_lambda_op): New.
* pt.cc (tsubst_lambda_expr): Begin/end a lambda scope. Push the
capture proxies. Build/remove a dummy operator() if needed. Set
LAMBDA_EXPR_CONST_QUAL_P.
* semantics.cc (parsing_lambda_declarator): New.
(outer_var_p): Also consider captures as outer variables if in a lambda
declarator.
(process_outer_var_ref): Reset containing_function when
parsing_lambda_declarator.
(finish_decltype_type): Process decls in the lambda-declarator as well.
Look at LAMBDA_EXPR_CONST_QUAL_P unless we have an xobj function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-decltype3.C: Remove xfail.
* g++.dg/warn/Wshadow-19.C: Add -Wpedantic. Adjust a dg-warning.
* g++.dg/warn/Wshadow-6.C: Adjust expected diagnostics.
* g++.dg/cpp23/lambda-scope1.C: New test.
* g++.dg/cpp23/lambda-scope2.C: New test.
* g++.dg/cpp23/lambda-scope3.C: New test.
* g++.dg/cpp23/lambda-scope4.C: New test.
* g++.dg/cpp23/lambda-scope4b.C: New test.
* g++.dg/cpp23/lambda-scope5.C: New test.
* g++.dg/cpp23/lambda-scope6.C: New test.
* g++.dg/cpp23/lambda-scope7.C: New test.
* g++.dg/cpp23/lambda-scope8.C: New test.
* g++.dg/cpp23/lambda-scope9.C: New test.