OpenMP/Fortran: Revamp handling of labels in metadirectives [PR122369,PR122508]
When a label is matched in the first statement after the end of a metadirective
body, it is bound to the associated region. However this prevents it from being
referenced elsewhere.
This patch fixes it by rebinding such labels to the outer region. It also
ensures that labels defined in an outer region can be referenced in a
metadirective body.
PR fortran/122369
PR fortran/122508
gcc/fortran/ChangeLog:
* gfortran.h (gfc_rebind_label): Declare new function.
* parse.cc (parse_omp_metadirective_body): Rebind labels to the outer
region. Maintain a vector of metadirective regions.
(gfc_parse_file): Initialise it.
* parse.h (GFC_PARSE_H): Declare it.
* symbol.cc (gfc_get_st_label): Look for existing labels in outer
metadirective regions.
(gfc_rebind_label): Define new function.
(gfc_define_st_label): Accept duplicate labels in metadirective body.
(gfc_reference_st_label): Accept shared DO termination labels in
metadirective body.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/pr122369-1.f90: New test.
* gfortran.dg/gomp/pr122369-2.f90: New test.
* gfortran.dg/gomp/pr122369-3.f90: New test.
* gfortran.dg/gomp/pr122369-4.f90: New test.
* gfortran.dg/gomp/pr122508-1.f90: New test.
* gfortran.dg/gomp/pr122508-2.f90: New test.
Bob Duff [Wed, 20 Aug 2025 18:07:14 +0000 (14:07 -0400)]
Ada: Fix visibility bug related to target name
This patch fixes the following bug:
If the right-hand side of an expression contains a target name
(i.e. "@"), and also contains a reference to a user-defined operator
that is directly visible because of a "use type" clause on a renaming of
the package where the operator is declared, the compiler gives an
incorrect error saying that the renamed package is not visible.
It turns out that setting Entity of resolved nodes is unnecessary
and wrong; the fix is to simply remove that code.
gcc/ada/ChangeLog:
PR ada/118208
* exp_ch5.adb
(Expand_Assign_With_Target_Names.Replace_Target):
Remove code setting Entity to Empty.
* sinfo.ads (Has_Target_Names):
Improve comment: add "@" to clarify what "target name"
means, and remove the content-free phrase "and must
be expanded accordingly."
Nathaniel Shead [Thu, 16 Oct 2025 11:51:23 +0000 (22:51 +1100)]
c++: Don't constrain template visibility using no-linkage variables [PR122253]
When finding the minimal visibility of a template, any reference to a
dependent automatic variable will cause the instantiation to be marked
as internal linkage. However, when processing the template decl we
don't yet know whether that should actually be the case, as a given
instantiation may not require referencing the local decl in its
mangling.
This patch fixes the issue by checking for no-linkage decls first, in
which case we just constrain using the type of the entity. We can't use
a check for lk_external/lk_internal in the other cases, as
instantiations referring to internal types can still have external
linkage as determined by the language, but should still constrain the
visibility of any declarations that refer to them.
PR c++/122253
gcc/cp/ChangeLog:
* decl2.cc (min_vis_expr_r): Don't mark no-linkage declarations
as VISIBILITY_ANON.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-16.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit f062a6b7985fcee82e02b626aada4e0824850bd0)
Jeff Law [Sat, 1 Nov 2025 14:30:41 +0000 (08:30 -0600)]
[PR rtl-optimization/122321][RISC-V] Bounds check another access to ira_reg_equiv array
So another case where we're indexing into the ira_reg_equiv array without
checking bounds. I sincerely hope we're not playing wack-a-mole here, but two
failures in a couple months for the same core problem is worrisome.
Bootstrapped and regression tested on x86_64 and riscv64 as well as run through
all the embedded targets in my tester without regressions.
PR rtl-optimization/122321
gcc/
* lra-constraints.cc (update_equiv): Make sure REGNO is in
ira_reg_equiv before trying to update ira_reg_equiv.
gcc/testsuite/
* gcc.target/riscv/rvv/autovec/pr122321.c: New test.
Andrew Pinski [Wed, 15 Oct 2025 16:59:25 +0000 (09:59 -0700)]
riscv: Fix gimple folding of the vset* intrinsics [PR122270]
The problem here is that when the backend folds the vset intrinsics,
it tries to keep the lhs of the new statement to be the same as the old statement
due to the check in gsi_replace. The problem is with a MEM_REF vset::fold was
unsharing the new lhs here and using the original lhs in the other new statement.
This meant the check in gsi_replace would fail.
This fixes that oversight by switching around which statement gets the unshared
version.
Note the comment in vset::fold was already correct just not matching the code:
/* Replace the call with two statements: a copy of the full tuple
to the call result, followed by an update of the individual vector.
The fold routines expect the replacement statement to have the
same lhs as the original call, so return the copy statement
rather than the field update. */
Changes since v1:
* v2: Fix testcase.
PR target/122270
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (vset::fold): Use the
unshare_expr for the statement that will be added seperately rather
the one which will be used for the replacement.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr122270-1.c: New test.
AVR: target/122527 -- Don't use __load_N to load from __flash1.
This patch fixes a case where a 3 byte or 4 byte load from __flash1
uses __load_3/4 to read the value, which is wrong.
This only occured when the device has ELPM but not ELPMx (avr31).
PR target/122527
gcc/
* config/avr/avr.cc (avr_load_libgcc_p): Return false if
the address-space is not ADDR_SPACE_FLASH.
(avr_out_lpm_no_lpmx [addr=REG]): Handle sizes of 3 and 4 bytes.
AVR: PR122505 - Fix bloated mulpsi3 in the wake of hacking around PR118012.
Since the PR118012 work-around patch, there is an SImode insn also for
the non-MUL case, but there is no mulpsi3. This makes the middle-end
use the mulsi3 insn for 24-bit multipications like in:
__uint24 mul24 (__uint24 a, __uint24 b)
{
return a * b;
}
The patch just allows the mulpsi3 insn for the non-MUL case, except for
AVR_TINY which passes the 2nd argument on the stack so no insn can be used.
The change might be beneficial even in the absence of PR118012 because
the __mulpsi3 footprint is leaner than a libcall.
PR tree-optimization/118012
PR tree-optimization/122505
gcc/
* config/avr/avr.md (mulpsi3): Also allow the insn condition
in the case where avropt_pr118012 && !AVR_TINY.
(*mulpsi3): Handle split for the !AVR_HAVE_MUL case.
(*mulpsi3-nomul.libgcc_split, *mulpsi3-nomul.libgcc): New insns.
Nathaniel Shead [Sun, 26 Oct 2025 11:27:33 +0000 (22:27 +1100)]
c++/modules: Track all static class variables [PR122421]
The linker error in the PR is caused because when a static is defined
out of the class body, it doesn't yet have a definition and so
read_var_def (which would otherwise have noted it) never gets called.
This instead moves the responsibility for noting class-scope variables
to read_class_def.
PR c++/122421
gcc/cp/ChangeLog:
* module.cc (trees_in::read_var_def): Don't handle class-scope
variables anymore.
(trees_in::read_class_def): Handle them here instead.
gcc/testsuite/ChangeLog:
* g++.dg/modules/inst-6_a.C: New test.
* g++.dg/modules/inst-6_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit fd5c057c2d01346d69119f88ca94debf27842e4e)
Harald Anlauf [Fri, 24 Oct 2025 19:33:08 +0000 (21:33 +0200)]
Fortran: IS_CONTIGUOUS and pointers to non-contiguous targets [PR114023]
PR fortran/114023
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_pointer_assignment): Always set dtype
when remapping a pointer. For unlimited polymorphic LHS use
elem_len from RHS.
* trans-intrinsic.cc (gfc_conv_is_contiguous_expr): Extend inline
generated code for IS_CONTIGUOUS for pointer arguments to detect
when span differs from the element size.
Harald Anlauf [Thu, 23 Oct 2025 19:21:04 +0000 (21:21 +0200)]
Fortran: fix TRANSFER of subarray component references [PR122386]
Commit r16-518 introduced a change that fixed inquiry references of complex
arrays as argument to the TRANSFER intrinsic by forcing a temporary. The
solution taken however turned out not to be generalizable to component
references of nested derived-type arrays. A better way is the revert that
patch and force the generation of a temporary when the SOURCE expression is
a not simply-contiguous array.
PR fortran/122386
gcc/fortran/ChangeLog:
* dependency.cc (gfc_ref_needs_temporary_p): Revert r16-518.
* trans-intrinsic.cc (gfc_conv_intrinsic_transfer): Force temporary
for SOURCE not being a simply-contiguous array.
gcc/testsuite/ChangeLog:
* gfortran.dg/transfer_array_subref_2.f90: New test.
Tamar Christina [Mon, 27 Oct 2025 17:55:38 +0000 (17:55 +0000)]
vect: Fix operand swapping on complex multiplication detection [PR122408]
For
SUBROUTINE a( j, b, c, d )
!GCC$ ATTRIBUTES noinline :: a
COMPLEX*16 b
COMPLEX*16 c( * ), d( * )
DO k = 1, j
c( k ) = - b * CONJG( d( k ) )
END DO
END
we incorrectly generate .IFN_COMPLEX_MUL instead of .IFN_COMPLEX_MUL_CONJ.
The issue happens because in the call to vect_validate_multiplication the
operand vectors are passed by reference and so the stripping of the NEGATE_EXPR
after matching modifies the input vector. If validation fail we flip the
operands and try again. But we've already stipped the negates and so if we
match we would match a normal multiply.
This fixes the API by marking the operands as const and instead pass an explicit
output vec that's to be used. This also reduces the number of copies we were
doing.
With this we now correctly detect .IFN_COMPLEX_MUL_CONJ. Weirdly enough I
couldn't reproduce this with any C example because they get reassociated
differently and always succeed on the first attempt. Fortran is easy to
trigger though so new fortran tests added.
gcc/ChangeLog:
PR tree-optimization/122408
* tree-vect-slp-patterns.cc (vect_validate_multiplication): Cleanup and
document interface.
(complex_mul_pattern::matches, complex_fms_pattern::matches): Update to
new interface.
gcc/testsuite/ChangeLog:
PR tree-optimization/122408
* gfortran.target/aarch64/pr122408_1.f90: New test.
* gfortran.target/aarch64/pr122408_2.f90: New test.
Jinyang He [Wed, 29 Oct 2025 08:07:35 +0000 (16:07 +0800)]
LoongArch: Only allow valid binary op when optimize conditional move
It is wrong that optimize from `if (cond) dest op= 1 << shift` to
`dest op= (cond ? 1 : 0) << shift` when `dest op 0 != dest`.
Like `and`, `mul` or `div`.
And in this optimization `mul` and `div` is optimized to shift.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_expand_conditional_move): Only allow valid binary
op when optimize conditional move.
Guo Jie [Wed, 29 Oct 2025 08:38:54 +0000 (16:38 +0800)]
LoongArch: Standard instruction template fnmam4 correction
The current implementation of the fnmam4 instruction template requires
the third source operand to be assigned the same hard register as the
target operand, but the constraint is not documented in the instruction
manual or standard template definitions. The current constraint will
generate additional data dependencies and extra instructions.
Patrick Palka [Tue, 14 Oct 2025 16:56:23 +0000 (12:56 -0400)]
c++: mem-initializer-id qualified name lookup is type-only [PR122192]
Since a mem-initializer needs to be able to initialize any base class,
lookup for which is type-only, we in turn need to make mem-initializer-id
qualified name lookup type-only too.
PR c++/122192
gcc/cp/ChangeLog:
* parser.cc (cp_parser_mem_initializer_id): Pass class_type
instead of typename_type to cp_parser_class_name in the
nested-name-specifier case.
gcc/testsuite/ChangeLog:
* g++.dg/template/dependent-base6.C: Verify mem-initializer-id
qualified name lookup is type-only too.
Patrick Palka [Fri, 10 Oct 2025 14:25:25 +0000 (10:25 -0400)]
c++: base-specifier name lookup is type-only [PR122192]
The r13-6098 change to make TYPENAME_TYPE no longer always ignore
non-type bindings needs another exception: base-specifiers that are
represented as TYPENAME_TYPE, for which lookup must be type-only (by
[class.derived.general]/2). This patch fixes this by giving such
TYPENAME_TYPEs a tag type of class_type rather than typename_type so
that we treat them like elaborated-type-specifiers (another type-only
lookup situation).
PR c++/122192
gcc/cp/ChangeLog:
* decl.cc (make_typename_type): Document base-specifier as
another type-only lookup case.
* parser.cc (cp_parser_class_name): Propagate tag_type to
make_typename_type instead of hardcoding typename_type.
(cp_parser_base_specifier): Pass class_type instead of
typename_type as tag_type to cp_parser_class_name.
OpenMP/C++: Fix label mangling in metadirective body [PR122378]
Testcase c-c++-common/gomp/attrs-metadirective-2.c failed in C++ when
OFFLOAD_TARGET_NAMES=nvptx-none. That was caused by label mangling being applied
to the use but not to the declaration.
This is now fixed by mangling the declaration as well.
PR c++/122378
gcc/cp/ChangeLog:
* parser.cc (cp_parser_label_declaration): Mangle label declaration in a
metadirective region.
OpenMP: Fix bogus diagnostics with intervening code [PR121452]
The introduction in r14-3488-ga62c8324e7e31a of OMP_STRUCTURED_BLOCK (to
diagnose invalid intervening code) caused a regression rejecting the valid use
of the Fortran CONTINUE statement to end a collapsed loop.
This patch fixes the incorrect error checking in the OMP lowering pass. It also
fixes a check in the Fortran front end that erroneously rejects a similar
statement in an ordered loop.
* openmp.cc (resolve_omp_do): Allow CONTINUE as end statement of a
perfectly nested loop.
gcc/ChangeLog:
* omp-low.cc (check_omp_nesting_restrictions): Accept an
OMP_STRUCTURED_BLOCK in a collapsed simd region and in an ordered loop.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/pr121452-1.c: New test.
* c-c++-common/gomp/pr121452-2.c: New test.
* gfortran.dg/gomp/pr121452-1.f90: New test.
* gfortran.dg/gomp/pr121452-2.f90: New test.
* gfortran.dg/gomp/pr121452-3.f90: New test.
Nathaniel Shead [Sat, 18 Oct 2025 12:43:14 +0000 (23:43 +1100)]
c++/modules: Use containing type as key for all member lambdas [PR122310]
The ICE in the linked PR occurs because we first stream the lambda type
before its keyed decl has been streamed, but the key decl's type depends
on the lambda. And so when streaming the key decl to check for an
existing decl to merge with, merging the key decl itself crashes because
its type has only been partially streamed.
This patch fixes the issue by generalising the existing FIELD_DECL
handling to any class member using the outermost containing TYPE_DECL as
its key type. This way we can guarantee that the key decl has been
streamed before the lambda type is otherwise needed.
PR c++/122310
gcc/cp/ChangeLog:
* module.cc (get_keyed_decl_scope): New function.
(trees_out::get_merge_kind): Use it.
(trees_out::key_mergeable): Use it.
(maybe_key_decl): Key to the containing type for all members.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-12.h: New test.
* g++.dg/modules/lambda-12_a.H: New test.
* g++.dg/modules/lambda-12_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
(cherry picked from commit 8212abbeffa69f143808e126f40c67f3eb7e7844)
Jakub Jelinek [Mon, 27 Oct 2025 16:43:17 +0000 (17:43 +0100)]
phiopt: Fix up DEBUG_EXPR_DECL creation in spaceship_replacement [PR122394]
The following testcase ICEs in gcc 15 (and is at least latent in 12-14 too),
because the DEBUG_EXPR_DECL has incorrect mode. It has
TREE_TYPE (orig_use_lhs) type, but TYPE_MODE (type) rather than
TYPE_MODE (TREE_TYPE (orig_use_lhs)) where the two types are sometimes
the same, but sometimes different (same if !has_cast_debug_uses, different
otherwise).
Though, there wouldn't be the this issue if it used the proper API to create
the DEBUG_EXPR_DECL which takes care of everything. This is the sole
spot that doesn't use that API.
Doesn't affect the trunk because the code has been removed and replaced with
different stuff after the libstdc++ ABI change in r16-3474.
Before r15-5557 the mode has been always wrong because this was done only
for has_cast_debug_uses. And the bug has been introduced with r12-5490.
Enough archeology, while it could be fixed by changing the second
SET_DECL_MODE argument, I think it is better to use build_debug_expr_decl.
2025-10-27 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/122394
* tree-ssa-phiopt.cc (spaceship_replacement): Use
build_debug_expr_decl instead of manually building DEBUG_EXPR_DECL
and getting SET_DECL_MODE wrong.
Jeff Law [Mon, 13 Oct 2025 20:33:10 +0000 (14:33 -0600)]
[RISC-V][PR target/120674] Avoid division by zero in dwarf emitter when vector is not enabled
This is a RISC-V specific failure in the dwarf2 emitter. When vector is not
enabled riscv_convert_vector_chunks sets the riscv_vector_chunks poly_int to
[1, 0].
riscv_dwarf_poly_indeterminite_value pulls out that 0 coefficient and uses that
as FACTOR triggering a divide by zero here:
> /* Add COEFF * ((REGNO / FACTOR) - BIAS) to the value:
> add COEFF * (REGNO / FACTOR) now and subtract
> COEFF * BIAS from the final constant part. */
> constant -= coeff * bias;
> add_loc_descr (&ret, new_reg_loc_descr (regno, 0));
> if (coeff % factor == 0)
> coeff /= factor;
> else
> {
> int amount = exact_log2 (factor);
> gcc_assert (amount >= 0);
> add_loc_descr (&ret, int_loc_descriptor (amount));
> add_loc_descr (&ret, new_loc_descr (DW_OP_shr, 0, 0));
> }
Per Robin's recommendation this patch adjusts
riscv_dwarf_poly_indeterminite_value to never set FACTOR to 0, but instead
detect this case and adjust its value to 1.
That fixes the ICE and looks good across the board in my tester. Waiting on
pre-commit CI, of course.
PR target/120674
gcc/
* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminite_value): Do not
set FACTOR to zero, for that case use one instead.
Tamar Christina [Fri, 17 Oct 2025 14:43:04 +0000 (15:43 +0100)]
AArch64: Extend intrinsics framework to account for merging predications without gp [PR121604]
In PR121604 the problem was noted that currently the SVE intrinsics
infrastructure assumes that for any predicated operation that the GP is at the
first argument position which has a svbool_t or for a unary merging operation
that it's in the second position.
However you have intrinsics like fmov_lane which have an svbool_t but it's not
a GP.
You also have instructions like BRKB which work only on predicates so it
incorrectly determines the first operand to be the GP, while that's the
inactive lanes.
However during apply_predication we do have the information about where the GP
is. This patch re-organizes the code to record this information into the
function_instance such that folders have access to this information.
For functions that are outliers like pmov_lane we can now override the
availability of the intrinsics having a GP.
gcc/ChangeLog:
PR target/121604
* config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication):
Store gp_index.
(struct pmov_to_vector_lane_def): Mark instruction as has no GP.
* config/aarch64/aarch64-sve-builtins.h (function_instance::gp_value,
function_instance::inactive_values, function_instance::gp_index,
function_shape::has_gp_argument_p): New.
* config/aarch64/aarch64-sve-builtins.cc (gimple_folder::fold_pfalse):
Simplify code and use GP helpers.
gcc/testsuite/ChangeLog:
PR target/121604
* gcc.target/aarch64/sve/pr121604_brk.c: New test.
* gcc.target/aarch64/sve2/pr121604_pmov.c: New test.
LIU Hao [Sat, 25 Oct 2025 09:19:34 +0000 (17:19 +0800)]
x86-64: Use `movsxd` to perform SI-to-DI extension in Intel syntax
Although there's no possibility of ambiguity, Intel manual says the mnemonic
for DWORD-to-QWORD sign-extension operation should be MOVSXD. Some assemblers
(GNU AS, NASM) also overload MOVSX, but some others don't accept MOVSX (LLVM,
MASM, YASM in NASM mode) and require MOVSXD.
This mnemonic was introduced in r0-34259-g123bf9e3f4056d in 2001, and has not
been updated ever since.
gcc/ChangeLog:
PR target/119079
* config/i386/i386.md: Use `movsxd` to perform SI-to-DI extension in Intel
syntax.
Harald Anlauf [Thu, 9 Oct 2025 16:43:22 +0000 (18:43 +0200)]
Fortran: fix "unstable" interfaces of external procedures [PR122206]
In the testcase repeated invocations of a function showed an apparently
unstable interface. This was caused by trying to guess an (inappropriate)
interface of the external procedure after processing of the procedure
arguments in gfc_conv_procedure_call. The mis-guessed interface showed up
in subsequent uses of the procedure symbol in gfc_conv_procedure_call. The
solution is to check for an existing interface of an external procedure
before trying to wildly guess based on just the actual arguments.
PR fortran/122206
gcc/fortran/ChangeLog:
* trans-types.cc (gfc_get_function_type): Do not clobber an
existing procedure interface.
Ada: Fix other instances of incorrect String lower bound in gnatlink
This also reverts an unintentional change introduced by the initial fix.
gcc/ada/
PR ada/81087
* gnatlink.adb (Is_Prefix): Move around, streamline and return false
when the prefix is not strict.
(Gnatlink): Fix other instances of incorrect lower bound assumption.
Linsen Zhou [Fri, 17 Oct 2025 03:05:04 +0000 (11:05 +0800)]
tree-object-size.cc: Fix assert constant offset in check_for_plus_in_loops [PR122012]
After commit 51b85dfeb19652bf3e0aaec08828ba7cee1e641c, when the
pointer offset is a variable in the loop, the object size of the
pointer may also need to be reexamined.
Which make gcc_assert in the check_for_plus_in_loops failed.
gcc/ChangeLog:
PR tree-optimization/122012
* tree-object-size.cc (check_for_plus_in_loops): Skip check
for the variable offset
gcc/testsuite/ChangeLog:
PR tree-optimization/122012
* gcc.dg/torture/pr122012.c: New test.
Andrew Stubbs [Mon, 20 Oct 2025 14:57:41 +0000 (14:57 +0000)]
libgomp: fine-grained pinned memory allocator
This patch introduces a new custom memory allocator for use with pinned
memory (in the case where the Cuda allocator isn't available). In future,
this allocator will also be used for Managed Memory. Both memories are
incompatible with the system malloc because allocated memory cannot share a
page with memory allocated for other purposes.
This means that small allocations will no longer consume an entire page of
pinned memory. Unfortunately, it also means that pinned memory pages will
never be unmapped (although they may be reused). This isn't a technical
limitation; the "free" algorithm could be extended in future, if needed.
The implementation is not perfect; there are various corner cases (especially
related to extending onto new pages) where allocations and reallocations may
be sub-optimal, but it should still be a step forward in support for small
allocations.
I have considered using libmemkind's "fixed" memory but rejected it for three
reasons: 1) libmemkind may not always be present at runtime, 2) there's no
currently documented means to extend a "fixed" kind one page at a time
(although the code appears to have an undocumented function that may do the
job, and/or extending libmemkind to support the MAP_LOCKED mmap flag with its
regular kinds would be straight-forward), 3) Managed Memory benefits from
having the metadata located in different memory and using an external
implementation makes it hard to guarantee this.
libgomp/ChangeLog:
* Makefile.am (libgomp_la_SOURCES): Add simple-allocator.c.
* Makefile.in: Regenerate.
* basic-allocator.c: Mention simple-allocator in the comment.
* config/linux/allocator.c: Include unistd.h.
(pin_ctx): New variable.
(ctxlock): New variable.
(linux_init_pin_ctx): New function.
(linux_memspace_alloc): Use simple-allocator for pinned memory.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.h (gomp_simple_alloc_init_context): New prototype.
(gomp_simple_alloc_register_memory): New prototype.
(gomp_simple_alloc): New prototype.
(gomp_simple_free): New prototype.
(gomp_simple_realloc): New prototype.
* libgomp.texi: Update pinned memory trait documentation.
* testsuite/libgomp.c/alloc-pinned-8.c: New test.
* simple-allocator.c: New file.
Andrew Stubbs [Tue, 14 Oct 2025 11:22:05 +0000 (11:22 +0000)]
libgomp, nvptx: Cuda pinned memory
Use Cuda to pin memory, instead of Linux mlock, when available.
There are two advantages: firstly, this gives a significant speed boost for
NVPTX offloading, and secondly, it side-steps the usual OS ulimit/rlimit
setting.
The design adds a device independent plugin API for allocating pinned memory,
and then implements it for NVPTX. At present, the other supported devices do
not have equivalent capabilities (or requirements).
libgomp/ChangeLog:
* config/linux/allocator.c: Include assert.h.
(using_device_for_page_locked): New variable.
(linux_memspace_alloc): Add init0 parameter. Support device pinning.
(linux_memspace_calloc): Set init0 to true.
(linux_memspace_free): Support device pinning.
(linux_memspace_realloc): Support device pinning.
(MEMSPACE_ALLOC): Set init0 to false.
* libgomp-plugin.h
(GOMP_OFFLOAD_page_locked_host_alloc): New prototype.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* libgomp.h (gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(struct gomp_device_descr): Add page_locked_host_alloc_func and
page_locked_host_free_func.
* libgomp.texi: Adjust the docs for the pinned trait.
* plugin/plugin-nvptx.c
(GOMP_OFFLOAD_page_locked_host_alloc): New function.
(GOMP_OFFLOAD_page_locked_host_free): Likewise.
* target.c (device_for_page_locked): New variable.
(get_device_for_page_locked): New function.
(gomp_page_locked_host_alloc): Likewise.
(gomp_page_locked_host_free): Likewise.
(gomp_load_plugin_for_device): Add page_locked_host_alloc and
page_locked_host_free.
* testsuite/libgomp.c/alloc-pinned-1.c: Change expectations for NVPTX
devices.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.
r16-4540-g80af807e52e4f4 exposed a bug in two testcases where the declaration of
local labels was wrongly commented out. That caused "duplicate label" errors.
Uncommenting declarations fixes it.
PR middle-end/122378
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/attrs-metadirective-2.c: Uncomment local label
declaration.
* c-c++-common/gomp/metadirective-2.c: Likewise.
OpenMP: Handle non-executable directives in intervening code [PR120180,PR122306]
OpenMP 6 permits non-executable directives in intervening code; this commit adds
support for a sensible subset, namely metadirectives, nothing, assume, and
'error at(compilation)'.
Also handle the special case where a metadirective can be resolved at parse time
to 'omp nothing'.
This fixes a build issue that affects 10 out 12 SPECaccel benchmarks.
* c-parser.cc (c_parser_pragma): Accept a subset of non-executable
OpenMP directives in intervening code.
(c_parser_omp_error): Reject 'error at(execution)' in intervening code.
(c_parser_omp_metadirective): Return early if only one selector matches
and it resolves to 'omp nothing'.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_metadirective): Return early if only one
selector matches and it resolves to 'omp nothing'.
(cp_parser_omp_error): Reject 'error at(execution)' in intervening code.
(cp_parser_pragma): Accept a subset of non-executable OpenMP directives
as intervening code.
gcc/fortran/ChangeLog:
* gfortran.h (enum gfc_exec_op): Add EXEC_OMP_FIRST_OPENMP_EXEC and
EXEC_OMP_LAST_OPENMP_EXEC.
* openmp.cc (gfc_match_omp_context_selector): Remove static. Remove
checks on score. Add cleanup. Remove checks on trait properties.
(gfc_match_omp_context_selector_specification): Remove static. Adjust
calls to gfc_match_omp_context_selector.
(gfc_match_omp_declare_variant): Adjust call to
gfc_match_omp_context_selector_specification.
(match_omp_metadirective): Likewise.
(icode_code_error_callback): Reject all statements except
'assume' and 'metadirective'.
(gfc_resolve_omp_context_selector): New function.
(resolve_omp_metadirective): Skip metadirectives which context selectors
can be statically resolved to false. Replace metadirective by its body
if only 'nothing' remains.
(gfc_resolve_omp_declare): Call gfc_resolve_omp_context_selector for
each variant.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/imperfect1.c: Adjust dg-error.
* c-c++-common/gomp/imperfect4.c: Likewise.
* c-c++-common/gomp/pr120180.c: Move to...
* c-c++-common/gomp/pr120180-1.c: ...here. Remove dg-error.
* g++.dg/gomp/attrs-imperfect1.C: Adjust dg-error.
* g++.dg/gomp/attrs-imperfect4.C: Likewise.
* gfortran.dg/gomp/declare-variant-2.f90: Adjust dg-error.
* gfortran.dg/gomp/declare-variant-20.f90: Likewise.
* c-c++-common/gomp/pr120180-2.c: New test.
* g++.dg/gomp/pr120180-1.C: New test.
* gfortran.dg/gomp/pr120180-1.f90: New test.
* gfortran.dg/gomp/pr120180-2.f90: New test.
* gfortran.dg/gomp/pr122306-1.f90: New file.
* gfortran.dg/gomp/pr122306-2.f90: New file.
Jakub Jelinek [Wed, 22 Oct 2025 11:11:52 +0000 (13:11 +0200)]
c++: Fix up RAW_DATA_CST handling in braced_list_to_string [PR122302]
The following testcase is miscompiled, because a RAW_DATA_CST tree
node is shared by multiple CONSTRUCTORs and when the braced_list_to_string
function changes one to extend the RAW_DATA_CST over the single preceding
and single succeeding INTEGER_CST, it changes the RAW_DATA_CST in
the other CONSTRUCTOR where the elts around it are still present.
Fixed by tweaking a copy of it instead, like we handle it in other spots.
2025-10-22 Jakub Jelinek <jakub@redhat.com>
PR c++/122302
* c-common.cc (braced_list_to_string): Call copy_node on RAW_DATA_CST
before changing RAW_DATA_POINTER and RAW_DATA_LENGTH on it.
* g++.dg/cpp0x/pr122302.C: New test.
* g++.dg/cpp/embed-27.C: New test.
Haochen Jiang [Tue, 21 Oct 2025 03:21:45 +0000 (11:21 +0800)]
i386: Correct cpu codename value for unknown model number
There are several changes for features enabled on cpus. r16-1666 disabled
CLDEMOTE on clients. r16-2224 removed Key locker since Panther Lake and
Clearwater forest. r16-4436 disabled PREFETCHI on Panther Lake.
The patches caused the current return guess value not aligned for
host_detect_local_cpu meeting the unknown model number. Correct the
logic according to the features enabled.
This patch will also backport to GCC14 and GCC15.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu): Correct
the logic for unknown model number cpu guess value.
The insn failed to match the pattern (aarch64-sve2.md):
;; Predicated binary operations with no reverse form, merging with zero.
;; At present we don't generate these patterns via a cond_* optab,
;; so there's no correctness requirement to handle merging with an
;; independent value.
(define_insn_and_rewrite "*cond_<sve_int_op><mode>_z"
[(set (match_operand:SVE_FULL_I 0 "register_operand")
(unspec:SVE_FULL_I
[(match_operand:<VPRED> 1 "register_operand")
(unspec:SVE_FULL_I
[(match_operand 5)
(unspec:SVE_FULL_I
[(match_operand:SVE_FULL_I 2 "register_operand")
(match_operand:SVE_FULL_I 3 "register_operand")]
SVE2_COND_INT_BINARY_NOREV)]
UNSPEC_PRED_X)
(match_operand:SVE_FULL_I 4 "aarch64_simd_imm_zero")]
UNSPEC_SEL))]
"TARGET_SVE2"
{@ [ cons: =0 , 1 , 2 , 3 ]
[ &w , Upl , 0 , w ] movprfx\t%0.<Vetype>, %1/z, %0.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
[ &w , Upl , w , w ] movprfx\t%0.<Vetype>, %1/z, %2.<Vetype>\;<sve_int_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
}
"&& !CONSTANT_P (operands[5])"
{
operands[5] = CONSTM1_RTX (<VPRED>mode);
}
[(set_attr "movprfx" "yes")]
)
because operands[3] and operands[4] were both expanded into the same register
operand containing a zero vector by define_expand "@cond_<sve_int_op><mode>".
This patch fixes the ICE by making a case distinction in
function_expander::use_cond_insn that uses add_fixed_operand if
fallback_arg == CONST0_RTX (mode), and otherwise add_input_operand (which was
previously the default and allowed the expansion of the zero-vector
fallback_arg to a register operand).
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for trunk?
Alex Coplan pointed out in the bugzilla ticket that this ICE goes back
to GCC 10. Shall we backport?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
Co-authored by: Richard Sandiford <rdsandiford@googlemail.com>
gcc/
PR target/121599
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::use_cond_insn): Use add_fixed_operand if
fallback_arg == CONST0_RTX (mode).
gcc/testsuite/
PR target/121599
* gcc.target/aarch64/sve2/pr121599.c: New test.
Jonathan Wakely [Thu, 25 Sep 2025 16:23:28 +0000 (17:23 +0100)]
libstdc++: Fix unsafe comma operators in <random> [PR122062]
This fixes a 'for' loop in std::piecewise_linear_distribution that
increments two iterators with a comma operator between them, making it
vulnerable to evil overloads of the comma operator.
It also changes a 'for' loop used by some other distributions, even
though those are only used with std::vector<double>::iterator and so
won't find any overloaded commas.
libstdc++-v3/ChangeLog:
PR libstdc++/122062
* include/bits/random.tcc (__detail::__normalize): Use void cast
for operands of comma operator.
(piecewise_linear_distribution): Likewise.
* testsuite/26_numerics/random/piecewise_linear_distribution/cons/122062.cc:
New test.
Reviewed-by: Tomasz KamiĆski <tkaminsk@redhat.com> Reviewed-by: Hewill Kang <hewillk@gmail.com>
(cherry picked from commit 11ce485bcffac0db005d77e100420535e54d0aa5)
Eric Botcazou [Mon, 20 Oct 2025 09:21:21 +0000 (11:21 +0200)]
Ada: Fix spurious warning for renaming of component of VFA record
This is a regression present on the mainline and all active branches: the
compiler gives a spurious "is not referenced" warning for the renaming of
a component of a Volatile_Full_Access record.
gcc/ada/
PR ada/107536
* exp_ch2.adb (Expand_Renaming): Mark the entity as referenced.
gcc/testsuite/
* gnat.dg/renaming18.adb: New test.
Alex Coplan [Mon, 13 Oct 2025 13:41:09 +0000 (13:41 +0000)]
aarch64, testsuite: Add -fchecking to test options [PR121772]
I noticed while testing a backport of the PR121772 fix to GCC 13 that
the test wasn't triggering the ICE as expected with the unpatched
compiler.
This turned out to be because the ICE is a checking ICE, and we
configure by default with --enable-checking=release on the branches.
Additionally, I hadn't noticed when doing the backports to 15 and 14
since there we still ICE later on in emit_move_insn even if we don't
catch the invalid gimple with checking.
I'm not too sure why the 13 branch doesn't see the emit_move_insn ICE,
but it's somewhat irrelevant - the important thing is that adding
-fchecking to the options makes the test fail as expected with an
unpatched compiler (i.e. with a gimple checking failure), even on
release branches.
I considered applying this patch to just the release branches, but
figured that trunk will at some point itself become a release branch, so
it seems to make most sense just to apply it everywhere.
I've checked that the test still passes with this patch, and still fails
if I revert the PR121772 fix.
gcc/testsuite/ChangeLog:
PR tree-optimization/121772
* gcc.target/aarch64/torture/pr121772.c: Add -fchecking to
dg-options.
Jason Merrill [Wed, 20 Aug 2025 03:15:20 +0000 (23:15 -0400)]
c++: pointer to auto member function [PR120757]
Here r13-1210 correctly changed &A<int>::foo to not be considered
type-dependent, but tsubst_expr of the OFFSET_REF got confused trying to
tsubst a type that involved auto. Fixed by getting the type from the
member rather than tsubst.
PR c++/120757
gcc/cp/ChangeLog:
* pt.cc (tsubst_expr) [OFFSET_REF]: Don't tsubst the type.
Jakub Jelinek [Thu, 9 Oct 2025 16:06:39 +0000 (18:06 +0200)]
gimplify: Fix up side-effect handling in 2nd __builtin_c[lt]zg argument [PR122188]
The patch from yesterday made me think about side-effects in the second
argument of __builtin_c[lt]zg. When we change
__builtin_c[lt]zg (x, y)
when y is not INTEGER_CST into
x ? __builtin_c[lt]zg (x) : y
with evaluating x only once, we omit the side-effects in y unless x is not
0. That looks undesirable, we should evaluate side-effects in y
unconditionally.
2025-10-09 Jakub Jelinek <jakub@redhat.com>
PR c/122188
* c-gimplify.cc (c_gimplify_expr): Also gimplify the second operand
before the COND_EXPR and use in COND_EXPR result of gimplification.
Jakub Jelinek [Wed, 8 Oct 2025 07:58:41 +0000 (09:58 +0200)]
gimplify: Fix up __builtin_c[lt]zg gimplification [PR122188]
The following testcase ICEs during gimplification.
The problem is that save_expr sometimes doesn't create a SAVE_EXPR but
returns the original complex tree (COND_EXPR) and the code then uses that
tree in 2 different spots without unsharing. As this is done during
gimplification it wasn't unshared when whole body is unshared and because
gimplification is destructive, the first time we gimplify it we destruct it
and second time we try to gimplify it we ICE on it.
Now, we could replace one a use with unshare_expr (a), but because this
is a gimplification hook, I think easier than trying to create a save_expr
is just gimplify the argument, then we know it is is_gimple_val and so
something without side-effects and can safely use it twice. That argument
would be the first thing to gimplify after return GS_OK anyway, so it
doesn't change argument sequencing etc.
2025-10-08 Jakub Jelinek <jakub@redhat.com>
PR c/122188
* c-gimplify.cc (c_gimplify_expr): Gimplify CALL_EXPR_ARG (*expr_p, 0)
instead of calling save_expr on it.
Jakub Jelinek [Mon, 6 Oct 2025 07:46:48 +0000 (09:46 +0200)]
stmt: Handle %cc[name] in resolve_asm_operand_names [PR122133]
Last year I've extended the asm template syntax in inline asm to support
%cc0 etc., apparently the first 2 letter generic operand modifier.
As the following testcase shows, I forgot to tweak the [foo] handling
for it though. As final.cc will error on any % ISALPHA not followed by
digit (with the exception of % c c digit), I think we can safely handle
this for any 2 letters in between % and [, instead of hardcoding it for
now only for %cc[ and changing it again next time we add something
two-letter.
2025-10-06 Jakub Jelinek <jakub@redhat.com>
PR middle-end/122133
* stmt.cc (resolve_asm_operand_names): Handle % and 2 letters followed
by open square.
Jakub Jelinek [Sat, 4 Oct 2025 15:06:16 +0000 (17:06 +0200)]
widening_mul: Reset flow sensitive info in maybe_optimize_guarding_check [PR122104]
In PR95852 I've added an optimization where next to just pattern
recognizing r = x * y; r / x != y or r = x * y; r / x == y
as .MUL_OVERFLOW or negation thereof it also recognizes
r = x * y; x && (r / x != y) or r = x * y; !x || (r / x == y)
by optimizing the guarding condition to always true/false.
The problem with that is that some value ranges recorded for
the SSA_NAMEs in the formerly conditional, now unconditional
basic block can be invalid.
This patch fixes it by calling reset_flow_sensitive_info_in_bb
if we optimize the guarding condition.
2025-10-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/122104
* tree-ssa-math-opts.cc (maybe_optimize_guarding_check): Call
reset_flow_sensitive_info_in_bb on bb when optimizing out the
guarding condition.
Avinash Jayakar [Mon, 13 Oct 2025 09:47:45 +0000 (15:17 +0530)]
match.pd: Do not canonicalize division by power 2 for {ROUND, CEIL}_DIV
Canonicalization of unsigned division by power of 2 only applies to
{TRUNC,FLOOR,EXACT}_DIV, therefore remove the same pattern for {CEIL,ROUND}_DIV,
which was added in a previous commit.
Robin Dapp [Tue, 7 Oct 2025 13:18:27 +0000 (07:18 -0600)]
[PATCH] RISC-V: Detect wrap in shuffle_series_pattern [PR121845].
Hi,
In shuffle_series_pattern we use series_p to determine if the permute
mask is a simple series. This didn't take into account that series_p
also returns true for e.g. {0, 3, 2, 1} where the step is 3 and the
indices form a series modulo 4.
We emit
vid + vmul
in order to synthesize a series. In order to be always correct we would
need a vrem afterwards still which does not seem worth it.
This patch adds the modulo for VLA permutes and punts if we wrap around
for VLS permutes. I'm not really certain whether we'll really see a wrapping
VLA series (certainly we haven't so far in the test suite) but as we observed
a VLS one here now it appears conservatively correct to module the indices.
Regtested on rv64gcv_zvl512b.
Regards
Robin
PR target/121845
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_series_patterns):
Modulo indices for VLA and punt when wrapping for VLS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121845.c: New test.