Arthur Cohen [Mon, 16 Dec 2024 12:01:13 +0000 (13:01 +0100)]
gccrs: hir: Add LangItem paths to PathPattern class
gcc/rust/ChangeLog:
* hir/tree/rust-hir-path.h: Adapt PathPattern to accept lang-item paths.
* hir/tree/rust-hir-path.cc: Assert we are dealing with a segmented path, create lang-item
constructors.
* hir/tree/rust-hir.cc (PathPattern::convert_to_simple_path): Likewise.
Dylan Gardner [Thu, 29 Aug 2024 11:43:42 +0000 (04:43 -0700)]
gccrs: Infer crate name after file opening
Fixes #3129.
gcc/rust/ChangeLog:
* rust-session-manager.cc (Session::handle_crate_name): Remove
crate name inference
(Session::compile_crate): Add crate name inference and error if
inferred name is empty. Remove CompileOptions::get_instance ()
that returned a local copy of the options. Rename
crate_name_changed to crate_name_found to match semantics.
(rust_crate_name_validation_test): Test inferring ".rs" name
* rust-session-manager.h: Modify handle_crate_name definition to
include filename.
gccrs: Add captures for ClosureExprInnerTyped with nr2
Captures were only processed for regular ClosureExprInner.
gcc/rust/ChangeLog:
* resolve/rust-late-name-resolver-2.0.cc (Late::visit): Add
ClosureExprInnerTyped visit implementation.
(add_captures): Add a function to avoid code duplication.
* resolve/rust-late-name-resolver-2.0.h: Add function prototype.
The compiler was still relying on NR1 for closure captures when using nr2
even though the resolver was not used and thus it's state empty.
gcc/rust/ChangeLog:
* resolve/rust-late-name-resolver-2.0.cc (Late::visit): Add environment
collection.
* resolve/rust-late-name-resolver-2.0.h: Add function prototype.
* resolve/rust-name-resolver.cc (Resolver::get_captures): Add assertion
to prevent NR2 usage with nr1 capture functions.
* typecheck/rust-hir-type-check-expr.cc (TypeCheckExpr::visit): Use
nr2 captures.
* util/rust-hir-map.cc (Mappings::add_capture): Add function to
register capture for a given closure.
(Mappings::lookup_captures): Add a function to lookup all captures
available for a given closure.
* util/rust-hir-map.h: Add function prototypes.
Owen Avery [Tue, 21 Jan 2025 22:02:35 +0000 (17:02 -0500)]
gccrs: Check for type paths nr2.0 can't handle yet
Some of our tests only work with name resolution 2.0 because the latter
misinterprets type paths. This change should cause the compiler to error out
if it would otherwise misinterpret a type path. A fix for type path
resolution isn't included in this comment, since doing so would make it
harder to track the meaningfulness of test regressions.
gcc/rust/ChangeLog:
* resolve/rust-late-name-resolver-2.0.cc
(Late::visit): Error out if a type path has multiple segments,
as we currently ignore every segment except the last.
Nr2 did not emit the correct error message for break identifier "rust".
gcc/rust/ChangeLog:
* resolve/rust-late-name-resolver-2.0.cc (Late::visit): Add "rust"
identifier detection akin to nr1.
(funny_ice_finalizer): Copy ICE finalizer from nr1.
* resolve/rust-late-name-resolver-2.0.h: Add funny_error member
context state.
* Make-lang.in: Add new translation unit for new ice finalizer.
* resolve/rust-ast-resolve-expr.cc: Move ice
finalizer to it's own file.
* resolve/rust-ice-finalizer.cc: New file.
* resolve/rust-ice-finalizer.h: New file.
gcc/testsuite/ChangeLog:
* rust/compile/nr2/exclude: Remove break-rust3.rs from exclude list.
Arthur Cohen [Thu, 16 Jan 2025 16:10:02 +0000 (17:10 +0100)]
gccrs: typecheck: Add basic handling for applying auto trait bounds
gcc/rust/ChangeLog:
* hir/rust-ast-lower-item.cc (ASTLoweringItem::visit): Register auto traits in mappings.
* util/rust-hir-map.cc (Mappings::insert_auto_trait): New.
(Mappings::get_auto_traits): New.
* util/rust-hir-map.h: Declare them.
* typecheck/rust-tyty-bounds.cc (TypeBoundsProbe::scan): Add auto trait bounds when
scanning.
gcc/testsuite/ChangeLog:
* rust/compile/nr2/exclude: Some parts of nr2.0 can't handle auto traits yet.
* rust/compile/auto_traits3.rs: Removed in favor of...
* rust/compile/auto_traits2.rs: ...this one.
* rust/compile/auto_traits4.rs: New test.
Arthur Cohen [Thu, 16 Jan 2025 15:55:56 +0000 (16:55 +0100)]
gccrs: typecheck: Separate assemble_builtin_candidate in two
This paves the way for adding trait bounds that aren't necessarily Sized.
gcc/rust/ChangeLog:
* typecheck/rust-tyty-bounds.cc (TypeBoundsProbe::add_trait_bound): New function.
* typecheck/rust-hir-type-bounds.h: Declare it.
(TypeBoundsProbe::assemble_builtin_candidate): Call into add_trait_bound.
Jonathan Wakely [Fri, 21 Mar 2025 22:49:44 +0000 (22:49 +0000)]
libstdc++: Ensure that std::vector<bool> allocator has bool value_type
This is the subject of LWG 4228 which notes that libstdc++ doesn't
enforce this requirement. That's just a bug because I forgot to add it
to vector<bool> when adding it elsewhere.
For consistency with the other containers we should not allow incorrect
allocator types for strict -std=c++NN modes, but it is very late to make
that change for GCC 15 so this only enables the assertion for C++20
(where it's required). For GCC 16 we can enable it for strict modes too.
libstdc++-v3/ChangeLog:
* include/bits/stl_bvector.h (vector<bool, A>): Enforce the
C++20 requirement that the allocator's value_type matches the
container.
* testsuite/23_containers/vector/bool/cons/from_range.cc: Fix
incorrect allocator type.
Reviewed-by: Tomasz KamiĆski <tkaminsk@redhat.com>
Richard Earnshaw [Mon, 24 Mar 2025 11:22:05 +0000 (11:22 +0000)]
arm: testsuite: tighten scan-assembler in unaligned-memcpy-4.c
The scan-assembler-not pattern in this test was too broad and matched
the 'unaligned' from the .file directive from the file name. Tighten it
to require a leading comment character.
Haochen Jiang [Mon, 24 Mar 2025 06:24:39 +0000 (14:24 +0800)]
i386: Raise deprecate warning for -mavx10.1-256/512 and -mevex512 while add -mavx10.1 back with 512 bit alias
When AVX10.1 options are added into GCC 14, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.1-256 and avx10.1-512 options into
compiler since there will be real platforms with 256 bit only support.
At the same time, for old platforms could also compile a 256 bit only
binary, we introduced -mno-evex512 to disable 512 bit vector.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. It will result in no need for split the
option for vector width. Therefore, we will remove them in this patch.
Unlike AVX10.2 options, AVX10.1 options has been there in a major
release, so we have to raise a deprecate warning in GCC 15 and remove
them in GCC 16. At the same time, to align with avx10.2 options, we will
add just removed avx10.1 option back with warning to mention its
behavior change.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_available_features): Change to FEATURE_AVX10_1.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_1_512_SET): Renamed to ...
(OPTION_MASK_ISA2_AVX10_1_SET): ... this.
(OPTION_MASK_ISA2_AVX10_2_SET): Use renamed macro.
(OPTION_MASK_ISA2_AVX10_1_UNSET): Ditto.
(ix86_handle_option): Ditto.
(processor_alias_table): Use P_PROC_AVX10_1.
* common/config/i386/i386-cpuinfo.h
(enum feature_priority): Rename from AVX10_1_512 to AVX10_1.
(enum processor_features): Ditto.
* common/config/i386/i386-isas.h: Add avx10.1.
* config/i386/driver-i386.cc
(host_detect_local_cpu): Use renamed enum.
* config/i386/i386-c.cc
(ix86_target_macros_internal): Rename to avx10.1.
* config/i386/i386-isa.def (AVX10_1_512): Rename to ...
(AVX10_1): ... this.
* config/i386/i386-options.cc (isa2_opts): Rename to avx10.1.
(ix86_valid_target_attribute_inner_p): Add avx10.1.
(ix86_option_override_internal): Rename to AVX10_1.
Revise warnings to mention behavior change for option
combination in GCC 16.
* config/i386/i386.h (PTA_DIAMONDRAPIDS): Use AVX10_1.
* config/i386/i386.opt: Add avx10.1.
Add deprecate warnings for mevex512 and mavx10.1-256/512.
* config/i386/i386.opt.urls: Add avx10.1.
* doc/extend.texi: Ditto.
* doc/sourcebuild.texi: Ditto.
Haochen Jiang [Mon, 24 Mar 2025 06:24:36 +0000 (14:24 +0800)]
i386: Remove avx10.2-256 and avx10.2-512 options
When AVX10.2 options are added into GCC 15, E-core is supposed to
support up to 256 bit vector width, while P-core up to 512 bit vector
width. Therefore, we added avx10.2-256 and avx10.2-512 options into
compiler since there will be real platforms with 256 bit only support.
However, all the future platforms will now support 512 bit vector width,
including P-core and E-core. It will result in no need for split the
option for vector width. Therefore, we will remove them in this patch.
Haochen Jiang [Mon, 24 Mar 2025 06:24:35 +0000 (14:24 +0800)]
i386: Adjust AVX10.2 testcases options
Before we change AVX10.2 options in GCC 15, we need to adjust
all related test options to -mavx10.2 to avoid breakage. Since
-mavx10.2 is now 512 bit and will be the final option we will
use. It will also be an one-time change for these tests on
options.
Haochen Jiang [Mon, 24 Mar 2025 06:23:40 +0000 (14:23 +0800)]
i386: Remove 256 bit rounding for AVX10.2 saturation convert instructions
Since we will support 512 bit on both P-core and E-core, 256 bit
rounding is not that useful because we currently have rounding feature
directly on E-core now and no need to use 256-bit rounding as somehow
a workaround. This patch will remove 256 bit rounding in AVX10.2 satcvt
intrins.
Haochen Jiang [Mon, 24 Mar 2025 06:23:37 +0000 (14:23 +0800)]
i386: Remove 256 bit rounding for AVX10.2 minmax and convert instructions
Since we will support 512 bit on both P-core and E-core, 256 bit
rounding is not that useful because we currently have rounding feature
directly on E-core now and no need to use 256-bit rounding as somehow
a workaround. This patch will remove those in AVX10.2 minmax and convert
intrins.
Nathaniel Shead [Sat, 22 Mar 2025 12:04:12 +0000 (23:04 +1100)]
c++/modules: Fix explicit instantiations and gnu_inlines [PR119154]
My change in r15-8012 for PR c++/119154 exposed a bug with explicit
instantation declarations. The change cleared DECL_INTERFACE_KNOWN for
all vague-linkage entities, including explicit instantiations. When we
then perform lazy loading at EOF (due to processing deferred function
bodies), expand_or_defer_fn ends up calling import_export_decl which
will error because DECL_INTERFACE_KNOWN is still unset but no definition
is available in the file, violating some assertions.
It turns out that for function templates marked inline we would not
respect an 'extern template' imported in general, either; this patch
fixes both of these issues by always treating explicit instantiations as
external, and so marking DECL_INTERFACE_KNOWN eagerly.
For an explicit instantiation declaration we don't want to emit the body
of the function as it must be emitted in a different TU anyway. And for
explicit instantiation definitions we similarly know that it will have
been emitted in the interface TU we streamed it in from, so there's
no need to emit it.
The same error can happen with lazy-loaded gnu_inlines at EOF; in some
cases they'll be marked DECL_COMDAT and pass through the vague_linkage_p
check anyway. This patch reworks the handling of gnu_inlines to ensure
that both DECL_INTERFACE_KNOWN is always correctly set and that
importing a gnu_inline function over the top of an existing forward
declaration works correctly.
The other case that duplicate_decls handles (importing a regular
definition over the top of a gnu_inline function) doesn't seem like
something we need to handle specially in modules; we'll just use the
existing gnu_inline function and rely on the guarantee that there is a
single non-inline function definition provided elsewhere.
PR c++/119154
gcc/cp/ChangeLog:
* decl2.cc (vague_linkage_p): Revert gnu_linkage handling.
* module.cc (importer_interface): New enumeration.
(get_importer_interface): New function.
(trees_out::core_bools): Use it to determine interface.
(trees_in::is_matching_decl): Propagate gnu_inline handling onto
existing forward declarations.
(trees_in::read_var_def): Also note explicit instantiation
definitions of variable templates to be emitted.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr119154_a.C: Move to...
* g++.dg/modules/gnu-inline-1_a.C: ...here, and add decl.
* g++.dg/modules/pr119154_b.C: Move to...
* g++.dg/modules/gnu-inline-1_b.C: here, and add check.
* g++.dg/modules/gnu-inline-1_c.C: New test.
* g++.dg/modules/gnu-inline-1_d.C: New test.
* g++.dg/modules/gnu-inline-2_a.C: New test.
* g++.dg/modules/gnu-inline-2_b.C: New test.
* g++.dg/modules/extern-tpl-3_a.C: New test.
* g++.dg/modules/extern-tpl-3_b.C: New test.
* g++.dg/modules/extern-tpl-4_a.H: New test.
* g++.dg/modules/extern-tpl-4_b.C: New test.
* g++.dg/modules/extern-tpl-4_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Sandra Loosemore [Thu, 13 Mar 2025 22:48:09 +0000 (22:48 +0000)]
Doc: Rearrange remaining top-level sections in extend.texi [PR42270]
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Nonlocal Gotos): Group with other built-ins
sections.
(Constructing Calls): Likewise.
(Pragmas): Move earlier in the section, before the built-ins docs.
(Thread-Local): Likewise.
(OpenMP): Likewise.
(OpenACC): Likewise.
Sandra Loosemore [Thu, 13 Mar 2025 14:52:10 +0000 (14:52 +0000)]
Doc: Add "Aggregate Types" sectioning to extend.texi [PR42270]
This is part of an incremental effort to make the chapter on GCC
extensions better organized by grouping/rearranging sections by topic.
gcc/ChangeLog
PR other/42270
* doc/extend.texi (Aggregate Types): New section.
(Variable Length): Make it a subsection of the above.
(Zero Length): Likewise.
(Empty Structures): Likewise.
(Flexible Array Members in Unions): Likewise.
(Flexible Array Members alone in Structures): Likewise.
(Unnamed Fields): Likewise.
(Cast to Union): Likewise.
(Subscripting): Likewise.
(Initializers): Likewise.
(Compound Literals): Likewise.
(Designated Inits): Likewise.
Iain Buclaw [Sun, 23 Mar 2025 11:57:27 +0000 (12:57 +0100)]
d: Fix ICE type variant differs by TYPE_PACKED [PR117621]
Introduced by r13-1104-gf4c3ce32fa54c1, which had an accidental self
assignment of TYPE_PACKED when it should have been assigned to the
type's variants.
PR d/117621
gcc/d/ChangeLog:
* types.cc (finish_aggregate_type): Propagate TYPE_PACKED to variants.
libgfortran/intrinsics: Fix build for targets with int32_t=long int
Without this, after r15-8650-g94fa9f4d27bac5, you'll see,
for targets where GFC_INTEGER_4 alias int32_t is a typedef
of long int (beware of artificially broken lines):
/x/gcc/libgfortran/intrinsics/reduce.c:269:1: error: conflicting types for 'reduce_scalar_c'; have 'void(void *, index_type, parray *, void (*)(void *, void *, void *), int *, gfc_array_l4 *, void *, void *, index_type, index_type)' {aka 'void(void *, long int, parray *, void (*)(void *, void *, void *), int *, gfc_array_l4 *, void *, void *, long int, long int)'}
269 | reduce_scalar_c (void *res,
| ^~~~~~~~~~~~~~~
[...] excessive error message verbiage deleted
/x/gcc/libgfortran/intrinsics/reduce.c: In function 'reduce_scalar_c':
/x/gcc/libgfortran/intrinsics/reduce.c:283:35: error: passing argument 4 of 'reduce' from incompatible pointer type [-Wincompatible-pointer-types]
283 | reduce (&ret, array, operation, dim, mask, identity, ordered);
| ^~~
| |
| int *
/x/gcc/libgfortran/intrinsics/reduce.c:41:24: note: expected 'GFC_INTEGER_4 *' {aka 'long int *'} but argument is of type 'int *'
41 | GFC_INTEGER_4 *dim,
| ~~~~~~~~~~~~~~~^~~
make[3]: *** [Makefile:4678: intrinsics/reduce.lo] Error 1
libgfortran:
* intrinsics/reduce.c (reduce_scalar_c): Correct type of parameter DIM.
Georg-Johann Lay [Sat, 15 Mar 2025 19:53:52 +0000 (20:53 +0100)]
AVR: target/119421 Better optimize some bit operations.
There are occasions where knowledge about nonzero bits makes some
optimizations possible. For example,
Rd |= Rn << Off
can be implemented as
SBRC Rn, 0
ORI Rd, 1 << Off
when Rn in { 0, 1 }, i.e. nonzero_bits (Rn) == 1. This patch adds some
patterns that exploit nonzero_bits() in some combiner patterns.
As insn conditions are not supposed to contain nonzero_bits(), the patch
splits such insns right after pass insn combine.
PR target/119421
gcc/
* config/avr/avr.opt (-muse-nonzero-bits): New option.
* config/avr/avr-protos.h (avr_nonzero_bits_lsr_operands_p): New.
(make_avr_pass_split_nzb): New.
* config/avr/avr.cc (avr_nonzero_bits_lsr_operands_p): New function.
(avr_rtx_costs_1): Return costs for the new insns.
* config/avr/avr.md (nzb): New insn attribute.
(*nzb=1.<code>...): New insns to better support some bit
operations for <code> in AND, IOR, XOR.
* config/avr/avr-passes.def (avr_pass_split_nzb): Insert pass
atfer combine.
* config/avr/avr-passes.cc (avr_pass_data_split_nzb). New pass data.
(avr_pass_split_nzb): New pass.
(make_avr_pass_split_nzb): New function.
* common/config/avr/avr-common.cc (avr_option_optimization_table):
Enable -muse-nonzero-bits for -O2 and higher.
* doc/invoke.texi (AVR Options): Document -muse-nonzero-bits.
gcc/testsuite/
* gcc.target/avr/torture/pr119421-sreg.c: New test.
Georg-Johann Lay [Sat, 22 Mar 2025 14:19:39 +0000 (15:19 +0100)]
AVR: libgcc: Properly exclude object files for AVRrc.
There are many objects / functions that are not available on AVRrc,
the reduced core. The old way to exclude some objects for AVRrc
did not work properly since it tested for MULTIFLAGS.
This does not work for, say MULTIFLAGS = "-mmcu=avrtiny -mdouble=64".
This patch uses $(findstring avrtiny,$(MULTIDIR)) in the condition.
libgcc/
* config/avr/t-avr (LIB1ASMFUNCS, LIB2FUNCS_EXCLUDE):
Properly handle avrtiny.
libgcc/config/avr/libf7/
* t-libf7 (libgcc-objects): Only add objects when building
for non-AVRrc.
Georg-Johann Lay [Fri, 21 Mar 2025 13:29:13 +0000 (14:29 +0100)]
AVR: Add attribute "used" for code in .initN and .initN sections.
Code in .initN and .initN sections is never called since these
sections are special and part of the startup resp. shutdown code.
This patch adds attribute "used" so they won't be optimized out.
gcc/
* config/avr/avr.cc (avr_attrs_section_name): New function.
(avr_insert_attributes): Add "used" attribute to functions
in .initN and .finiN.
Patrick Palka [Sat, 22 Mar 2025 14:15:52 +0000 (10:15 -0400)]
c++: structural equality and partially inst typedef [PR119379]
Complex alias templates (and their dependent specializations) always use
structural equality because we need to treat them as transparent in some
contexts but not others. Structural-ness however wasn't being preserved
during partial instantiation, which for the below testcase leads to the
checking ICE
same canonical type node for different types 'S<int>::P<U>' and 'pair<int, U>'
when comparing those two types with comparing_dependent_aliases set
(from alias_ctad_tweaks).
This patch fixes this by making us preserve structural-ness for
partially instantiated typedefs in general.
PR c++/119379
gcc/cp/ChangeLog:
* pt.cc (tsubst_decl) <case TYPE_DECL>: Preserve structural-ness
of a partially instantiated typedef.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias24.C: New test.
Iain Buclaw [Sat, 22 Mar 2025 09:49:06 +0000 (10:49 +0100)]
d: Improve UFCS/property error message
Improves on the speller suggestions for UFCS by using the location of
the suggested symbol, and considering that local functions aren't
eligible for UFCS instead of making a nonsensical suggestion, such as
"no property foo, did you mean foo?".
Jakub Jelinek [Sat, 22 Mar 2025 07:39:38 +0000 (08:39 +0100)]
Fix up some further cases of missing or extraneous spaces in diagnostics
Given the recent PR119406 I've tried to grep for concatenated string
literals without space at the end of one line and at the start of next line,
unless it was obviously intentional.
Furthermore, I've then looked through gcc.pot looking for 2 adjacent spaces
and looking back if that wasn't the case of "something "
" with spaces at both sides".
Here is the result from that.
I think just the c.opt change needs an explanation, the "" in the
description is simply eaten up somewhere during the option processing and
gcc -v --help before this patch was displaying
-Wdeprecated-literal-operator Warn about deprecated space between and suffix in a user-defined literal operator.
2025-03-22 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.cc (warn_switch_unreachable_and_auto_init_r): Add missing
space in the middle of diagnostics.
* tree-vect-stmts.cc (vectorizable_load): Add missing space in the
middle of debug dump message.
* sym-exec/sym-exec-state.cc (state::check_args_compatibility):
Likewise.
gcc/c-family/
* c.opt (Wdeprecated-literal-operator): Use \"\" rather than ""
in option description.
gcc/fortran/
* resolve.cc (resolve_procedure_expression): Remove extraneous space
from the middle of diagnostics.
Iain Sandoe [Fri, 21 Mar 2025 15:14:19 +0000 (15:14 +0000)]
cobol, driver: Handle targets without HAVE_LD_STATIC_DYNAMIC.
This fixes a typo where libraries were not added for targets without
HAVE_LD_STATIC_DYNAMIC.
It also adds the libraries in this case;
typically, a target without HAVE_LD_STATIC_DYNAMIC can take the
-static-libgcobol and use that to drive a spec substitution viz:
%{static-libgcobol:%:replace-outfile(-lgcobol libgcobol.a%s)}
which needs both the library and -static-libgcobol to be present
in the driver output.
gcc/cobol/ChangeLog:
* gcobolspec.cc (add_arg_lib): Fix typo.
(lang_specific_driver): Arrange to append both -lgcobol
and -static-libgcobol for targets without
HAVE_LD_STATIC_DYNAMIC.
Tobias Burnus [Fri, 21 Mar 2025 23:36:44 +0000 (00:36 +0100)]
libgomp.fortran/get-mapped-ptr-1.f90: Use -6 for non-conf dev number
This is a fix for the GOMP_interop commit r15-8654-g99e2906ae255fc that
added GOMP_DEVICE_DEFAULT_OMP_61 alias omp_default_device, which is a
conforming device number. But that test used -5 as check for a
non-conforming device number.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/get-mapped-ptr-1.f90: Use -6
not -5 as non-conforming device number.
Tobias Burnus [Fri, 21 Mar 2025 20:39:42 +0000 (21:39 +0100)]
libgomp/plugin: Add initial interop support to nvptx + gcn
The interop directive operates on an opaque object that represents a
foreign runtime. This commit adds support for
this to the two offloading plugins.
For nvptx, it supports cuda, cuda_driver and hip; the latter is AMD's
version of CUDA which for Nvidia devices boils down to normal CUDA.
Thus, at the end for this limited use, cuda/cuda_driver/hip are all
the same - and for plugin-nvptx.c, the they differ only in terms of
what gets fr_id, fr_name and get_interop_type_desc return.
For gcn, it supports hip and hsa.
Regarding get-mapped-ptr-1.c: That's actually a fix for the
GOMP_interop commit r15-8654-g99e2906ae255fc that added
GOMP_DEVICE_DEFAULT_OMP_61 alias omp_default_device, which is
a conforming device number. But that test used -5 as check for a
non-conforming device number.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (_LIBGOMP_PLUGIN_INCLUDE): Define.
(struct hsa_runtime_fn_info): Add two queue functions.
(hipError_t, hipCtx_t, hipStream_s, hipStream_t): New types.
(struct hip_runtime_fn_info): New.
(hip_runtime_lib, hip_fns): New global vars.
(init_environment_variables): Handle hip_runtime_lib.
(init_hsa_runtime_functions): Load the two queue functions.
(init_hip_runtime_functions, GOMP_OFFLOAD_interop,
GOMP_OFFLOAD_get_interop_int, GOMP_OFFLOAD_get_interop_ptr,
GOMP_OFFLOAD_get_interop_str,
GOMP_OFFLOAD_get_interop_type_desc): New.
* plugin/plugin-nvptx.c (_LIBGOMP_PLUGIN_INCLUDE): Define.
(GOMP_OFFLOAD_interop, GOMP_OFFLOAD_get_interop_int,
GOMP_OFFLOAD_get_interop_ptr, GOMP_OFFLOAD_get_interop_str,
GOMP_OFFLOAD_get_interop_type_desc): New.
* testsuite/libgomp.c/interop-fr-1.c: New test.
* testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: Use -6
not -5 as non-conforming device number.
Jakub Jelinek [Fri, 21 Mar 2025 19:26:00 +0000 (20:26 +0100)]
lra, v2: emit caller-save register spills before call insn [PR116028]
Here is an updated version of Surya's PR116028 fix from August, which got
reverted because it caused bootstrap failures on aarch64, later on bootstrap
comparison errors there as well and problems on other targets as well.
Original description:
LRA emits insns to save caller-save registers in the
inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
Basic Block) and traverses the insns in the EBBs in reverse order from
the last insn to the first insn. When LRA sees a write to a pseudo (that
has been assigned a caller-save register), and there is a read following
the write, with an intervening call insn between the write and read,
then LRA generates a spill immediately after the write and a restore
immediately before the read. The spill is needed because the call insn
will clobber the caller-save register.
If there is a write insn and a call insn in two separate BBs but
belonging to the same EBB, the spill insn gets generated in the BB
containing the write insn. If the write insn is in the entry BB, then
the spill insn that is generated in the entry BB prevents shrink wrap
from happening. This is because the spill insn references the stack
pointer and hence the prolog gets generated in the entry BB itself.
This patch ensures the the spill insn is generated before the call insn
instead of after the write. This also ensures that the spill occurs
only in the path containing the call.
The changes compared to the first r15-2810 version are:
1) the reason for aarch64 miscompilations and later on bootstrap comparison
issues as can be seen on the pr118615.c testcase in the patch was that
when curr_insn is a JUMP_INSN or some cases of CALL_INSNs,
split_if_necessary is called with before_p true and if it is successful,
the code set use_insn = PREV_INSN (curr_insn); instead of use_insn =
curr_insn; and that use_insn is then what is passed to
add_next_usage_insn; now, if the patch decides to emit the save
instruction(s) before the first call after curr_insn in the ebb rather
than before the JUMP_INSN/CALL_INSN, PREV_INSN (curr_insn) is some random
insn before it, not anything related to the split_reg actions.
If it is e.g. a DEBUG_INSN in one case vs. some unrelated other insn
otherwise, that can affect further split_reg within the same function
2) as suggested by Surya in PR118615, it makes no sense to try to change
behavior if the first call after curr_insn is in the same bb as curr_insn
3) split_reg is actually called sometimes from within inherit_in_ebb but
sometimes from elsewhere; trying to use whatever last call to
inherit_in_ebb saw last is a sure way to run into wrong-code issues,
so instead of clearing the rtx var at the start of inherit_in_ebb it is
now cleared at the end of it
4) calling the var latest_call_insn was weird, inherit_in_ebb walks the ebb
backwards, so what the var contains is the first call insn within the
ebb (after curr_insn)
5) the patch was using
lra_process_new_insns (PREV_INSN (latest_call_insn), NULL, save,
"Add save<-reg");
to emit the save insn before latest_call_insn. That feels quite weird
given that latest_call_insn has explicit support for adding stuff
before some insn or after some insn, adding something before some
insn doesn't really need to be done as addition after PREV_INSN
6) some formatting nits + new testcase + removal of xfail even on arm32
Bootstrapped/regtested on x86_64-linux/i686-linux (my usual
--enable-checking=yes,rtl,extra builds), aarch64-linux (normal default
bootstrap) and our distro scratch build
({x86_64,i686,aarch64,powerpc64le,s390x}-linux --enable-checking=release
LTO profiledbootstrap/regtest), I think Sam James tested on 32-bit arm
too.
On aarch64-linux this results in
-FAIL: gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
I admit I don't know the code well nor understood everything it is doing.
I have some concerns:
1) I wonder if there is a guarantee that first_call_insn if non-NULL will be
always in between curr_insn and usage_insn when call_save_p; I'd hope
yes because if usage_insn is before first_call_insn in the ebb,
presumably it wouldn't need to find call save regs because the range
wouldn't cross any calls
2) I wonder whether it wouldn't be better instead of inserting the saves
before first_call_insn insert it at the start of the bb containing that
call (after labels of course); emitting it right before a call could
mislead code looking for argument slot initialization of the call
3) even when avoiding the use_insn = PREV_INSN (curr_insn);, I wonder
if it is ok to use use_insn equal to curr_insn rather than the insns
far later where we actually inserted it, but primarily because I don't
understand the code much; I think for the !before_p case it is doing
similar thing on a shorter distance, the saves were emitted after
curr_insn and we record it on curr_insn
2025-03-21 Surya Kumari Jangala <jskumari@linux.ibm.com>
Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/116028
PR rtl-optimization/118615
* lra-constraints.cc (first_call_insn): New variable.
(split_reg): Spill register before first_call_insn if call_save_p
and the call is in a different bb in the ebb.
(split_if_necessary): Formatting fix.
(inherit_in_ebb): Set first_call_insn when handling a CALL_INSN.
For successful split_if_necessary with before_p, only change
use_insn if it emitted any new instructions before curr_insn.
Clear first_call_insn before returning.
* gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
* gcc.dg/pr10474.c: Remove xfail for powerpc and arm.
* gcc.dg/pr118615.c: New test.
OpenMP: 'interop' construct - add ME support + target-independent libgomp
This patch partially enables use of the OpenMP interop construct by adding
middle end support, mostly in the omplower pass, and in the target-independent
part of the libgomp runtime. It follows up on previous patches for C, C++ and
Fortran front ends support. The full interop feature requires another patch to
enable foreign runtime support in libgomp plugins.
gcc/ChangeLog:
* builtin-types.def
(BT_FN_VOID_INT_INT_PTR_PTR_PTR_INT_PTR_INT_PTR_UINT_PTR): New.
* gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_INTEROP.
* gimple-pretty-print.cc (dump_gimple_omp_interop): New function.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_INTEROP.
* gimple.cc (gimple_build_omp_interop): New function.
(gimple_copy): Handle GIMPLE_OMP_INTEROP.
* gimple.def (GIMPLE_OMP_INTEROP): Define.
* gimple.h (gimple_build_omp_interop): Declare.
(gimple_omp_interop_clauses): New function.
(gimple_omp_interop_clauses_ptr): Likewise.
(gimple_omp_interop_set_clauses): Likewise.
(gimple_return_set_retval): Handle GIMPLE_OMP_INTEROP.
* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_INIT,
OMP_CLAUSE_USE and OMP_CLAUSE_DESTROY.
(gimplify_omp_interop): New function.
(gimplify_expr): Replace sorry with call to gimplify_omp_interop.
* omp-builtins.def (BUILT_IN_GOMP_INTEROP): Define.
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_INIT,
OMP_CLAUSE_USE and OMP_CLAUSE_DESTROY.
(scan_omp_1_stmt): Handle GIMPLE_OMP_INTEROP.
(lower_omp_interop_action_clauses): New function.
(lower_omp_interop): Likewise.
(lower_omp_1): Handle GIMPLE_OMP_INTEROP.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_destroy): Make addressable.
(c_parser_omp_clause_init): Make addressable.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_init): Make addressable.
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Make OMP_CLAUSE_DESTROY and
OMP_CLAUSE_INIT addressable.
* types.def (BT_FN_VOID_INT_INT_PTR_PTR_PTR_INT_PTR_INT_PTR_UINT_PTR):
New.
Jason Merrill [Thu, 20 Mar 2025 16:57:15 +0000 (12:57 -0400)]
ipa: target clone and mangling alias [PR114992]
Since the mangling of the second lambda changed (previously we counted all
lambdas, now we only count lambdas with the same signature), we
generate_mangling_alias for handler<lambda2> for backward compatibility.
Since handler is COMDAT, resolve_alias puts the alias in the same comdat
group as handler itself. Then create_dispatcher_calls tries to add the
alias to the same comdat group as the dispatcher, but it's already in a
same_comdat_group, so we ICE.
It seems like we're just missing a remove_from_same_comdat_group before
add_to_same_comdat_group.
PR c++/114992
gcc/ChangeLog:
* multiple_target.cc (create_dispatcher_calls):
remove_from_same_comdat_group before add_to_same_comdat_group.
Paul Thomas [Fri, 21 Mar 2025 16:20:21 +0000 (16:20 +0000)]
Fortran: Implement the F2018 reduce intrinsic [PR85836]
2025-03-21 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/85836
* check.cc (get_ul_from_cst_cl): New function used in
check_operation.
(check_operation): New function used in check_reduce and
check_co_reduce.
(gfc_check_co_reduce): Use it.
(gfc_check_reduce): New function.
(gfc_check_rename): Add prototype for intrinsic with 6 arguments.
* gfortran.h : Add isym id for reduce and prototype for f6.
* intrinsic.cc (do_check): Add another argument expression and use
it in the call to the six argument specific check.
(add_sym_6): New function.
(add_functions): Add the discription of the reduce intrinsic and
add it to the intrinsic list.
* intrinsic.h : Add prototypes for gfc_check_reduce and
gfc_resolve_reduce.
* iresolve.cc (generate_reduce_op_wrapper): Generate a wrapper
subroutine for the 'operation' function to enable the library
implementation to be type agnostic and use pointer arithmetic
throughout.
(gfc_resolve_reduce): New function.
* trans-expr.cc (gfc_conv_procedure_call): Add flag for scalar
reduce. Generate a return variable 'sr' for scalar reduce, pass its
address to the library function and return it as the scalar result.
* trans-intrinsic.cc (gfc_conv_intrinsic_function): Array valued
reduce is called in same way as reshape. Fall through for call to
the scalar version.
gcc/testsuite/
PR fortran/85836
* gfortran.dg/reduce_1.f90: New test
* gfortran.dg/reduce_2.f90: New test
libgfortran/
PR libfortran/85836
* Makefile.am : Add reduce.c
* Makefile.in : Regenerated
* gfortran.map : Add _gfortran_reduce, _gfortran_reduce_scalar,
_gfortran_reduce_c and _gfortran_reduce_scalar_c to the list.
* intrinsics/reduce.c (reduce, reduce_scalar, reduce_c,
reduce_scalar_c): New functions and prototypes
Richard Earnshaw [Fri, 21 Mar 2025 15:20:03 +0000 (15:20 +0000)]
arm: testsuite: make unaligned-memcpy-*.c executable tests [PR91614]
These tests have been looking for a very specific instruction sequence
which has the tendency to be fairly unstable as a result. But what is
more interesting is that the the tests must not contain instructions
that can't be used for unaligned data, and whether or not the copy is
executed correctly.
So make these tests executable and scan the assembler only to confirm
the absence of instructions that must not be used when the data is not
aligned.
These tests also used to be restricted to targets that support
unaligned accesses (because you get very different code otherwise).
But now we've made the tests executable and to check for the absence
of problem instructions, just falling back to memcpy *is* an
acceptable implementation. So remove the requirement for unaligned
accesses.
gcc/testsuite:
PR target/91614
* gcc.target/arm/unaligned-memcpy-1.c: Make the test executable.
Only scan for the absence of instructions that cannot access
misaligned data. Remove constraint of having unaligned accesses.
* gcc.target/arm/unaligned-memcpy-2.c: Likewise.
* gcc.target/arm/unaligned-memcpy-3.c: Likewise.
* gcc.target/arm/unaligned-memcpy-4.c: Likewise.
This test is designed to check that if one of the operands is
aligned (but the other isn't) we expand to a sensible sequence and
bypass most of the overhead of doing a memcpy. But on targets without
unaligned accessess, we still end up calling memcpy. It's then a
lottery as to whether the prologue and epilogue code, plus the
set-up for the memcpy itself, generate instructions that match the
scan patterns.
Since in those cases we're not actually testing what the test is looking
for anyway, just skip the test on strict-alignment targets.
Thomas Schwinge [Wed, 19 Mar 2025 11:18:26 +0000 (12:18 +0100)]
C++: Adjust implicit '__cxa_bad_cast' prototype to reality
In 2001 Subversion r40924 (Git commit 52a11cbfcf0cfb32628b6953588b6af4037ac0b6)
"IA-64 ABI Exception Handling", '__cxa_bad_cast' changed from 'void *' to
'void' return type:
--- libstdc++-v3/libsupc++/exception_support.cc
+++ /dev/null
@@ -1,388 +0,0 @@
-[...]
-// Helpers for rtti. Although these don't return, we give them return types so
-// that the type system is not broken.
-extern "C" void *
-__cxa_bad_cast ()
-{
- [...]
-}
-[...]
..., which is in conflict with the library code with 'void' return type:
// BEGIN GLOBAL FUNCTION DECL: __cxa_bad_cast
.visible .func __cxa_bad_cast;
// BEGIN GLOBAL FUNCTION DEF: __cxa_bad_cast
.visible .func __cxa_bad_cast
{
[...]
}
..., and we thus get execution test FAIL for 'g++.dg/rtti/dyncast2.C':
error : Prototype doesn't match for '__cxa_bad_cast' in 'input file 7 at offset 51437', first defined in 'input file 7 at offset 51437'
nvptx-run: cuLinkAddData failed: device kernel image is invalid (CUDA_ERROR_INVALID_SOURCE, 300)
With this patched, we get the expected:
// BEGIN GLOBAL FUNCTION DECL: __cxa_bad_cast
-.extern .func (.param .u64 %value_out) __cxa_bad_cast;
+.extern .func __cxa_bad_cast;