Matthieu Longo [Mon, 23 Sep 2024 14:34:57 +0000 (15:34 +0100)]
dwarf2: add hooks for architecture-specific CFIs
Architecture-specific CFI directives are currently declared an processed
among others architecture-independent CFI directives in gcc/dwarf2* files.
This approach creates confusion, specifically in the case of DWARF
instructions in the vendor space and using the same instruction code.
Such a clash currently happen between DW_CFA_GNU_window_save (used on
SPARC) and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both
having the same instruction code 0x2d.
Then AArch64 compilers generates a SPARC CFI directive (.cfi_window_save)
instead of .cfi_negate_ra_state, contrarilly to what is expected in
[DWARF for the Arm 64-bit Architecture (AArch64)](https://github.com/
ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst).
This refactoring does not solve completely the problem, but improve the
situation by moving some of the processing of those directives (more
specifically their output in the assembly) to the backend via 2 target
hooks:
- DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any).
- OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string.
Additionally, this patch also contains a renaming of an enum used for
return address mangling on AArch64.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_output_cfi_directive): New hook for CFI directives.
(aarch64_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* config/sparc/sparc.cc
(sparc_output_cfi_directive): New hook for CFI directives.
(sparc_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* coretypes.h
(struct dw_cfi_node): Forward declaration of CFI type from
gcc/dwarf2out.h.
(enum dw_cfi_oprnd_type): Same.
(enum dwarf_call_frame_info): Same.
* doc/tm.texi: Regenerated from doc/tm.texi.in.
* doc/tm.texi.in: Add doc for new target hooks.
type of enum to allow forward declaration.
* dwarf2cfi.cc
(struct dw_cfi_row): Update the description for window_save
and ra_mangled.
(dwarf2out_frame_debug_cfa_negate_ra_state): Use AArch64 CFI
directive instead of the SPARC one.
(change_cfi_row): Use the right CFI directive's name for RA
mangling.
(output_cfi): Remove explicit architecture-specific CFI
directive DW_CFA_GNU_window_save that falls into default case.
(output_cfi_directive): Use target hook as default.
* dwarf2out.cc (dw_cfi_oprnd1_desc): Use target hook as default.
* dwarf2out.h (enum dw_cfi_oprnd_type): specify underlying type
of enum to allow forward declaration.
(dw_cfi_oprnd1_desc): Call target hook.
(output_cfi_directive): Use dw_cfi_ref instead of struct
dw_cfi_node *.
* hooks.cc
(hook_bool_dwcfi_dwcfioprndtyperef_false): New.
(hook_bool_FILEptr_dwcfiptr_false): New.
* hooks.h
(hook_bool_dwcfi_dwcfioprndtyperef_false): New.
(hook_bool_FILEptr_dwcfiptr_false): New.
* target.def: Documentation for new hooks.
Matthieu Longo [Mon, 23 Sep 2024 14:31:18 +0000 (15:31 +0100)]
Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE
The current name REG_CFA_TOGGLE_RA_MANGLE is not representative of what
it really is, i.e. a register to represent several states, not only a
binary one. Same for dwarf2out_frame_debug_cfa_toggle_ra_mangle.
Matthieu Longo [Mon, 23 Sep 2024 14:03:37 +0000 (15:03 +0100)]
libgcc: hide CIE and FDE data for DWARF architecture extensions behind a handler.
This patch provides a new handler MD_ARCH_FRAME_STATE_T to hide an
architecture-specific structure containing CIE and FDE data related
to DWARF architecture extensions.
Hiding the architecture-specific attributes behind a handler has the
following benefits:
1. isolating those data from the generic ones in _Unwind_FrameState
2. avoiding casts to custom types.
3. preserving typing information when debugging with GDB, and so
facilitating their printing.
This approach required to add a new header md-unwind-def.h included at
the top of libgcc/unwind-dw2.h, and redirecting to the corresponding
architecture header via a symbolic link.
An obvious drawback is the increase in complexity with macros, and
headers. It also caused a split of architecture definitions between
md-unwind-def.h (types definitions used in unwind-dw2.h) and
md-unwind.h (local types definitions and handlers implementations).
The naming of md-unwind.h with .h extension is a bit misleading as
the file is only included in the middle of unwind-dw2.c. Changing
this naming would require modification of others backends, which I
prefered to abstain from. Overall the benefits are worth the added
complexity from my perspective.
libgcc/ChangeLog:
* Makefile.in: New target for symbolic link to md-unwind-def.h
* config.host: New parameter md_unwind_def_header. Set it to
aarch64/aarch64-unwind-def.h for AArch64 targets, or no-unwind.h
by default.
* config/aarch64/aarch64-unwind.h
(aarch64_pointer_auth_key): Move to aarch64-unwind-def.h
(aarch64_cie_aug_handler): Update.
(aarch64_arch_extension_frame_init): Update.
(aarch64_demangle_return_addr): Update.
* configure.ac: New substitute variable md_unwind_def_header.
* unwind-dw2.h (defined): MD_ARCH_FRAME_STATE_T.
* config/aarch64/aarch64-unwind-def.h: New file.
* configure: Regenerate.
* config/no-unwind.h: Updated comment
Matthieu Longo [Mon, 23 Sep 2024 14:03:35 +0000 (15:03 +0100)]
aarch64: skip copy of RA state register into target context
The RA state register is local to a frame, so it should not be copied to
the target frame during the context installation.
This patch adds a new backend handler that check whether a register
needs to be skipped or not before its installation.
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h
(MD_FRAME_LOCAL_REGISTER_P): new handler checking whether a register
from the current context needs to be skipped before installation into
the target context.
(aarch64_frame_local_register): Likewise.
* unwind-dw2.c (uw_install_context_1): use MD_FRAME_LOCAL_REGISTER_P.
Matthieu Longo [Mon, 23 Sep 2024 14:03:30 +0000 (15:03 +0100)]
aarch64: store signing key and signing method in DWARF _Unwind_FrameState
This patch is only a refactoring of the existing implementation
of PAuth and returned-address signing. The existing behavior is
preserved.
_Unwind_FrameState already contains several CIE and FDE information
(see the attributes below the comment "The information we care
about from the CIE/FDE" in libgcc/unwind-dw2.h).
The patch aims at moving the information from DWARF CIE (signing
key stored in the augmentation string) and FDE (the used signing
method) into _Unwind_FrameState along the already-stored CIE and
FDE information.
Note: those information have to be saved in frame_state_reg_info
instead of _Unwind_FrameState as they need to be savable by
DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
both rely on the attribute "prev".
Those new information in _Unwind_FrameState simplifies the look-up
of the signing key when the return address is demangled. It also
allows future signing methods to be easily added.
_Unwind_FrameState is not a part of the public API of libunwind,
so the change is backward compatible.
A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
allows to reset values (if needed) in the frame state and unwind
context before changing the frame state to the caller context.
A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
isolates the architecture-specific augmentation strings in AArch64
backend, and allows others architectures to reuse augmentation
strings that would have clashed with AArch64 DWARF extensions.
aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
were documented to clarify where the value of the RA state register
is stored (FS and CONTEXT respectively).
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h
(AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
(aarch64_ra_signing_method_t): The diversifiers used to sign a
function's return address.
(aarch64_pointer_auth_key): The key used to sign a function's
return address.
(aarch64_cie_signed_with_b_key): Deleted as the signing key is
available now in _Unwind_FrameState.
(MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
handler for architecture extensions.
(MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
initialization routine for DWARF frame state and context before
execution of DWARF instructions.
(aarch64_context_ra_state_get): Read RA state register from CONTEXT.
(aarch64_ra_state_get): Read RA state register from FS.
(aarch64_ra_state_set): Write RA state register into FS.
(aarch64_ra_state_toggle): Toggle RA state register in FS.
(aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
(aarch64_arch_extension_frame_init): Initialize defaults for the
signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
(aarch64_demangle_return_addr): Rely on the frame registers and
the signing_key attribute in _Unwind_FrameState.
* unwind-dw2-execute_cfa.h:
Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
instead of DW_CFA_GNU_window_save.
(DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA
state register. Toggle RA state register without resetting 'how'
to REG_UNSAVED.
* unwind-dw2.c:
(extract_cie_info): Save the signing key in the current
_Unwind_FrameState while parsing the augmentation data.
(uw_frame_state_for): Reset some attributes related to architecture
extensions in _Unwind_FrameState.
(uw_update_context): Move authentication code to AArch64 unwinding.
* unwind-dw2.h (enum register_rule): Give a name to the existing
enum for the register rules, and replace 'unsigned char' by 'enum
register_rule' to facilitate debugging in GDB.
(_Unwind_FrameState): Add a new architecture-extension attribute
to store the signing key.
OpenMP: Fix omp_get_device_from_uid, minor cleanup
In Fortran, omp_get_device_from_uid can also accept substrings, which are
then not NUL terminated. Fixed by introducing a fortran.c wrapper function.
Additionally, in case of a fail the plugin functions now return NULL instead
of failing fatally such that a fall-back UID is generated.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Strip "omp_" from
string; move get_device_from_uid as now a '_' suffix exists.
libgomp/ChangeLog:
* fortran.c (omp_get_device_from_uid_): New function.
* libgomp.map (GOMP_6.0): Add it.
* oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'.
* omp_lib.f90.in: Make it used by removing bind(C).
* omp_lib.h.in: Likewise.
* target.c (omp_get_device_from_uid): Ensure the device is initialized.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment;
return NULL in case of an error.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise.
* testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
The target dependent mlra option was designed to be able to quickly
switch between LRA and reload. The reload register allocator step is
scheduled for retirement, thus, remove the functionality of mlra,
keeping it for backward compatibility.
PR target/113954
gcc/ChangeLog:
* config/arc/arc.cc (TARGET_LRA_P): Always return true.
(arc_lra_p): Remove.
* config/arc/arc.h (TARGET_LRA): Remove.
* config/arc/arc.opt (mlra): Change it to do nothing.
* doc/invoke.texi (mlra): Update option description.
Simon Martin [Mon, 16 Sep 2024 11:45:32 +0000 (13:45 +0200)]
c++: Don't crash when mangling member with anonymous union or template type [PR100632, PR109790]
We currently crash upon mangling members that have an anonymous union or
a template operator type.
The problem is that before calling write_unqualified_name,
write_member_name asserts that it has a declaration whose DECL_NAME is
an identifier node that is not that of an operator. This is wrong:
- In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME
- In PR109790, it's a legitimate template declaration for an operator
(this was accepted up to GCC 10)
This assert was added via r11-6301, to be sure that we do write the "on"
marker for operator members.
This patch removes that assert and instead
- Lets members with an anonymous union type go through
- For operators, adds the missing "on" marker for ABI versions greater
than the highest usable with GCC 10
PR c++/109790
PR c++/100632
gcc/cp/ChangeLog:
* mangle.cc (write_member_name): Handle members whose type is an
anonymous union member. Write missing "on" marker for operators
when ABI version is at least 16.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype83.C: New test.
* g++.dg/cpp0x/decltype83a.C: New test.
* g++.dg/cpp1y/lambda-ice3.C: New test.
* g++.dg/cpp1y/lambda-ice3a.C: New test.
* g++.dg/cpp2a/nontype-class67.C: New test.
Simon Martin [Wed, 18 Sep 2024 10:35:27 +0000 (12:35 +0200)]
c++: Don't ICE due to artificial constructor parameters [PR116722]
The following code triggers an ICE
=== cut here ===
class base {};
class derived : virtual public base {
public:
template<typename Arg> constexpr derived(Arg) {}
};
int main() {
derived obj(1.);
}
=== cut here ===
The problem is that cxx_bind_parameters_in_call ends up attempting to
convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE
(the type of the __in_chrg parameter), which ICEs.
This patch changes cxx_bind_parameters_in_call to return early if it's
called with a *structor that has an __in_chrg or __vtt_parm parameter
since the expression won't be a constant expression.
Note that in the test case, the constructor is not constexpr-suitable,
however it's OK since it's a template according to my read of paragraph
(3) of [dcl.constexpr].
PR c++/116722
gcc/cp/ChangeLog:
* constexpr.cc (cxx_bind_parameters_in_call): Leave early for
{con,de}structors of classes with virtual bases.
Richard Biener [Mon, 23 Sep 2024 08:30:32 +0000 (10:30 +0200)]
tree-optimization/116810 - out-of-bound access to matches[]
The following makes sure to apply forced splitting of groups for
firced single-lane SLP only when the group being analyzed has more
than one lane. This avoids an out-of-bound access to matches[].
PR tree-optimization/116810
* tree-vect-slp.cc (vect_build_slp_instance): Onlu force
splitting for group_size > 1.
Richard Biener [Mon, 23 Sep 2024 09:05:37 +0000 (11:05 +0200)]
tree-optimization/116796 - virtual LC SSA broken after unrolling
When the unroller unloops loops it tracks whether it changes any
nesting relationship of remaining loops but when scanning a loops
preheader it fails to pass down the LC-SSA-invalidated bitmap, losing
the fact that an unrolled formerly inner loop can now be placed on
an exit of its outer loop. The following fixes that.
PR tree-optimization/116796
* cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated
bitmap and pass it on.
(remove_path): Pass LC-SSA-invalidated to fix_loop_placements.
Tamar Christina [Mon, 23 Sep 2024 10:45:43 +0000 (11:45 +0100)]
middle-end: Insert invariant instructions before the gsi [PR116812]
The new invariant statements should be inserted before the current
statement and not after. This goes fine 99% of the time but when the
current statement is a gcond the control flow gets corrupted.
The following restricts the elementwise SLP vectorization to the
single-lane case which is the reason I enabled it to avoid regressions
with non-SLP. The PR shows that multi-line SLP loads with elementwise
accesses require work, I'll open a new bug to track this for the
future.
PR tree-optimization/116791
* tree-vect-stmts.cc (get_group_load_store_type): Only
fall back to elementwise access for single-lane SLP, restore
hard failure mode for other cases.
gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h
In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed
was added and no longer required fprintf '#include' removed, missing
somehow that with -mstack-size=, the generated configure_stack_size
will use 'setenv' and 'true'.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf
lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.
Pan Li [Sat, 21 Sep 2024 14:30:18 +0000 (22:30 +0800)]
Genmatch: Fix ICE for binary phi cfg mismatching [PR116795]
This patch would like to fix one ICE when try to match the binary
phi for below cfg. We check the first edge of the Phi block comes
from b0, instead of check the only one edge of b1 comes from the
b0 too. Thus, it will result in some code to be recog as .SAT_SUB
but it is not, and finally result the verify_ssa failure.
Andrew Pinski [Sun, 22 Sep 2024 20:18:30 +0000 (13:18 -0700)]
gimple: Simplify gimple_seq_nondebug_singleton_p
The implementation of gimple_seq_nondebug_singleton_p
was convoluted on how to determine if the sequence
was a singleton (which could contain debug statements).
This simplifies the function into two calls. One to get the start
after all of the debug statements and then check to see if it
is at the one before the end (or there is only debug statements
afterwards).
Bootstrapped and tested on x86_64-linux-gnu (including ada).
gcc/ChangeLog:
* gimple-iterator.h (gimple_seq_nondebug_singleton_p):
Rewrite to be simplely, gsi_start_nondebug/gsi_one_nondebug_before_end_p.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-8.c: New test.
testsuite/gfortran.dg/unsigned_22.f90: Add missing close with delete, PR116701
Without this patch, gfortran.dg/unsigned_22.f90 fails for
non-effective-target fd_truncate targets, i.e. targets that
don't support chsize or ftruncate. See also
libgfortran/io/unix.c:raw_truncate. It passes on the first
run, but leaves behind a file "fort.10" which is then picked
up by subsequent runs, but since that file is to be
rewritten, the libgfortran machinery tries to truncate it,
which fails. The file always being left behind, is
primarily because the test-case lacks a deleting
close-statement, apparently accidentally.
Incidentally, this "fort.10" artefact is also picked up by
gfortran.dg/write_check3.f90 causing that test to fail too,
observable as a regression for non-fd_truncate targets since
the unsigned_22.f90 introduction. Also, when running
e.g. the whole of gfortran.dg/dg.exp, the "fort.10" is later
deleted by gfortran.dg/write_direct_eor.f90 (which
regardlessly passes), erasing the clue of the cause of the
write_check3 failure. Also, running just
dg.exp=write_check3.f90 or manually repeating the commands
in gfortran.log showed no error.
N.B.: this close-statement will not help if unsigned_22 for
some reason fails, executing one of the "stop" statements,
but that's also the case for many other tests.
PR testsuite/116701
* gfortran.dg/unsigned_22.f90: Add missing close with delete.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-13.c: New test.
* gcc.target/riscv/sat_s_add-14.c: New test.
* gcc.target/riscv/sat_s_add-15.c: New test.
* gcc.target/riscv/sat_s_add-16.c: New test.
* gcc.target/riscv/sat_s_add-run-13.c: New test.
* gcc.target/riscv/sat_s_add-run-14.c: New test.
* gcc.target/riscv/sat_s_add-run-15.c: New test.
* gcc.target/riscv/sat_s_add-run-16.c: New test.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-10.c: New test.
* gcc.target/riscv/sat_s_add-11.c: New test.
* gcc.target/riscv/sat_s_add-12.c: New test.
* gcc.target/riscv/sat_s_add-9.c: New test.
* gcc.target/riscv/sat_s_add-run-10.c: New test.
* gcc.target/riscv/sat_s_add-run-11.c: New test.
* gcc.target/riscv/sat_s_add-run-12.c: New test.
* gcc.target/riscv/sat_s_add-run-9.c: New test.
testsuite, coroutines: Add tests for non-supension ramp returns.
Although it is most common for the ramp function to see a return when a coroutine
first suspends, there are other possibilities. For example all the awaits could
be ready - effectively the coroutine will then run to completion and deallocation.
Another case is where the first active suspension point causes the current routine
to be cancelled and thence destroyed.
These cases are tested here.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/torture/special-termination-00-sync-completion.C: New test.
* g++.dg/coroutines/torture/special-termination-01-self-destruct.C: New test.
libgcc, Darwin: From macOS 11, make that the earliest supported.
For libgcc, we have (so far) supported building a DSO that supports
earlier versions of the OS than the target. From macOS 11, there are
APIs that do not exist on earlier OS versions, so limit the libgcc
range to macOS11..current.
libgcc/ChangeLog:
* config.host: From macOS 11, limit earliest macOS support
to macOS 11.
* config/t-darwin-min-11: New file.
I noticed that char8_t was missing from the list of types that were
prevented from using the std::formatter partial specialization for
integer types. That partial specialization was also matching
cv-qualified integer types, because std::integral<const int> is true.
This change simplifies the constraints by introducing a new variable
template which is only true for cv-unqualified integer types, with
explicit specializations to exclude the character types. This should be
slightly more efficient than the previous constraints that checked
std::integral<T> and (!__is_one_of<T, char, wchar_t, ...>). It also
avoids the need for a separate std::formatter specialization for 128-bit
integers, as they can be handled by the new variable template too.
libstdc++-v3/ChangeLog:
* include/std/format (__format::__is_formattable_integer): New
variable template and specializations.
(template<integral, __char> struct formatter): Replace
constraints on first arg with __is_formattable_integer.
* testsuite/std/format/formatter/requirements.cc: Check that
std::formatter specializations for char8_t and const int are
disabled.
Jonathan Wakely [Wed, 18 Sep 2024 16:20:29 +0000 (17:20 +0100)]
libstdc++: Fix formatting of most negative chrono::duration [PR116755]
When formatting chrono::duration<signed-integer-type, P>::min() we were
causing undefined behaviour by trying to form the negative of the most
negative value. If we convert negative durations with integer rep to the
corresponding unsigned integer rep then we can safely represent all
values.
libstdc++-v3/ChangeLog:
PR libstdc++/116755
* include/bits/chrono_io.h (formatter<duration<R,P>>::format):
Cast negative integral durations to unsigned rep.
* testsuite/20_util/duration/io.cc: Test the most negative
integer durations.
Jonathan Wakely [Fri, 13 Sep 2024 09:18:46 +0000 (10:18 +0100)]
libstdc++: add default template parameters to algorithms
This implements P2248R8 + P3217R0, both approved for C++26.
The changes are mostly mechanical; the struggle is to keep readability
with the pre-P2248 signatures.
* For containers, "classic STL" algorithms and their parallel versions,
introduce a macro and amend their declarations/definitions with it.
The macro either expands to the defaulted parameter or to nothing
in pre-C++26 modes.
* For range algorithms, we need to reorder their template parameters.
I've done so unconditionally, because users cannot rely on template
parameters of algorithms (this is explicitly authorized by
[algorithms.requirements]/15). The defaults are then hidden behind
another macro.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h: Add projected_value_t.
* include/bits/algorithmfwd.h: Add the default template
parameter to the relevant forward declarations.
* include/pstl/glue_algorithm_defs.h: Likewise.
* include/bits/ranges_algo.h: Add the default template
parameter to range-based algorithms.
* include/bits/ranges_algobase.h: Likewise.
* include/bits/ranges_util.h: Likewise.
* include/bits/ranges_base.h: Add helper macros.
* include/bits/stl_iterator_base_types.h: Add helper macro.
* include/bits/version.def: Add the new feature-testing macro.
* include/bits/version.h: Regenerate.
* include/std/algorithm: Pull the feature-testing macro.
* include/std/ranges: Likewise.
* include/std/deque: Pull the feature-testing macro, add
the default for std::erase.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/string: Likewise.
* include/std/vector: Likewise.
* testsuite/23_containers/default_template_value.cc: New test.
* testsuite/25_algorithms/default_template_value.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Tamar Christina [Sun, 22 Sep 2024 12:38:49 +0000 (13:38 +0100)]
middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern
Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
In the cases where the conditonal is loop invariant or non-boolean it instead
converts the operation back into GENERIC and hides much of the operation from
the analysis part of the vectorizer.
i.e.
a ? b : c
is transformed into:
a != 0 ? b : c
however by doing so we can't perform any optimization on the mask as they aren't
explicit until quite late during codegen.
To fix this this patch lowers booleans earlier and so ensures that we are always
in GIMPLE.
For when the value is a loop invariant boolean we have to generate an additional
conversion from bool to the integer mask form.
This is done by creating a loop invariant a ? -1 : 0 with the target mask
precision and then doing a normal != 0 comparison on that.
To support this the patch also adds the ability to during pattern matching
create a loop invariant pattern that won't be seen by the vectorizer and will
instead me materialized inside the loop preheader in the case of loops, or in
the case of BB vectorization it materializes it in the first BB in the region.
* gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
* gcc.dg/vect/vect-conditional_store_5.c: New test.
* gcc.dg/vect/vect-conditional_store_6.c: New test.
Tamar Christina [Sun, 22 Sep 2024 12:34:10 +0000 (13:34 +0100)]
aarch64: Take into account when VF is higher than known scalar iters
Consider low overhead loops like:
void
foo (char *restrict a, int *restrict b, int *restrict c, int n)
{
for (int i = 0; i < 9; i++)
{
int res = c[i];
int t = b[i];
if (a[i] != 0)
res = t;
c[i] = res;
}
}
For such loops we use latency only costing since the loop bounds is known and
small.
The current costing however does not consider the case where niters < VF.
So when comparing the scalar vs vector costs it doesn't keep in mind that the
scalar code can't perform VF iterations. This makes it overestimate the cost
for the scalar loop and we incorrectly vectorize.
This patch takes the minimum of the VF and niters in such cases.
Before the patch we generate:
note: Original vector body cost = 46
note: Vector loop iterates at most 1 times
note: Scalar issue estimate:
note: load operations = 2
note: store operations = 1
note: general operations = 1
note: reduction latency = 0
note: estimated min cycles per iteration = 1.000000
note: estimated cycles per vector iteration (for VF 32) = 32.000000
note: SVE issue estimate:
note: load operations = 5
note: store operations = 4
note: general operations = 11
note: predicate operations = 12
note: reduction latency = 0
note: estimated min cycles per iteration without predication = 5.500000
note: estimated min cycles per iteration for predication = 12.000000
note: estimated min cycles per iteration = 12.000000
note: Low iteration count, so using pure latency costs
note: Cost model analysis:
vs after:
note: Original vector body cost = 46
note: Known loop bounds, capping VF to 9 for analysis
note: Vector loop iterates at most 1 times
note: Scalar issue estimate:
note: load operations = 2
note: store operations = 1
note: general operations = 1
note: reduction latency = 0
note: estimated min cycles per iteration = 1.000000
note: estimated cycles per vector iteration (for VF 9) = 9.000000
note: SVE issue estimate:
note: load operations = 5
note: store operations = 4
note: general operations = 11
note: predicate operations = 12
note: reduction latency = 0
note: estimated min cycles per iteration without predication = 5.500000
note: estimated min cycles per iteration for predication = 12.000000
note: estimated min cycles per iteration = 12.000000
note: Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations
note: Low iteration count, so using pure latency costs
note: Cost model analysis:
gcc/ChangeLog:
* config/aarch64/aarch64.cc (adjust_body_cost):
Cap VF for low iteration loops.
Mikael Morin [Sat, 21 Sep 2024 16:33:11 +0000 (18:33 +0200)]
fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]
Introduce the -finline-intrinsics flag to control from the command line
whether to generate either inline code or calls to the functions from the
library, for the MINLOC and MAXLOC intrinsics.
The flag allows to specify inlining either independently for each intrinsic
(either MINLOC or MAXLOC), or all together. For each intrinsic, a default
value is set if none was set. The default value depends on the optimization
setting: inlining is avoided if not optimizing or if optimizing for size;
otherwise inlining is preferred.
There is no direct support for this behaviour provided by the .opt options
framework. It is obtained by defining three different variants of the flag
(finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
the same underlying option variable. Each enum value (corresponding to an
intrinsic function) uses two identical bits, and the variable is initialized
with alternated bits, so that we can tell whether the value was set or not
by checking whether the two bits have different values.
PR fortran/90608
gcc/ChangeLog:
* flag-types.h (enum gfc_inlineable_intrinsics): New type.
gcc/fortran/ChangeLog:
* invoke.texi(finline-intrinsics): Document new flag.
* lang.opt (finline-intrinsics, finline-intrinsics=,
fno-inline-intrinsics): New flags.
* options.cc (gfc_post_options): If the option variable controlling
the inlining of MAXLOC (respectively MINLOC) has not been set, set
it or clear it depending on the optimization option variables.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
if inlining for the intrinsic is disabled according to the option
variable.
gcc/testsuite/ChangeLog:
* gfortran.dg/minmaxloc_18.f90: New test.
* gfortran.dg/minmaxloc_18a.f90: New test.
* gfortran.dg/minmaxloc_18b.f90: New test.
* gfortran.dg/minmaxloc_18c.f90: New test.
* gfortran.dg/minmaxloc_18d.f90: New test.
Mikael Morin [Sat, 21 Sep 2024 16:33:04 +0000 (18:33 +0200)]
fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]
Continue the second set of loops where the first one stopped in the
generated inline MINLOC/MAXLOC code in the cases where the generated code
contains two sets of loops. This fixes a regression that was introduced
when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
greater than 1, no DIM argument, and either non-scalar MASK or floating-
point ARRAY.
In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
code, we previously generated code such as (for rank 2 ARRAY, so with two
levels of nesting):
for (idx11 in lower1..upper1)
{
for (idx12 in lower2..upper2)
{
...
if (...)
{
...
goto second_loop;
}
}
}
second_loop:
for (idx21 in lower1..upper1)
{
for (idx22 in lower2..upper2)
{
...
}
}
which means we process the first elements twice, once in the first set
of loops and once in the second one. This change avoids this duplicate
processing by using a conditional as lower bound for the second set of
loops, generating code like:
second_loop_entry = false;
for (idx11 in lower1..upper1)
{
for (idx12 in lower2..upper2)
{
...
if (...)
{
...
second_loop_entry = true;
goto second_loop;
}
}
}
second_loop:
for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
{
for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
{
...
second_loop_entry = false;
}
}
It was expected that the compiler optimizations would be able to remove the
state variable second_loop_entry. It is the case if ARRAY has rank 1 (so
without loop nesting), the variable is removed and the loop bounds become
unconditional, which restores previously generated code, fully fixing the
regression. For larger rank, unfortunately, the state variable and
conditional loop bounds remain, but those cases were previously using
library calls, so it's not a regression.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
of index variables. Set them using the loop indexes before leaving
the first set of loops. Generate a new loop entry predicate.
Initialize it. Set it before leaving the first set of loops. Clear
it in the body of the second set of loops. For the second set of
loops, update each loop lower bound to use the corresponding index
variable if the predicate variable is set.
Mikael Morin [Sat, 21 Sep 2024 16:32:59 +0000 (18:32 +0200)]
fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
Enable generation of inline MINLOC/MAXLOC code in the case where DIM
is not present, and either ARRAY is of floating point type or MASK is an
array. Those cases are the remaining bits to fully support inlining of
non-CHARACTER MINLOC/MAXLOC without DIM. They are treated together because
they generate similar code, the NANs for REAL types being handled a bit like
a second level of masking. These are the cases for which we generate two
sets of loops.
This change affects the code generating the second loop, that was previously
accessible only in the cases ARRAY has rank 1 only. The single variable
initialization and update are changed to apply to multiple variables, one
per dimension.
The code generated is as follows (if ARRAY has rank 2):
for (idx11 in lower1..upper1)
{
for (idx12 in lower2..upper2)
{
...
if (...)
{
...
goto second_loop;
}
}
}
second_loop:
for (idx21 in lower1..upper1)
{
for (idx22 in lower2..upper2)
{
...
}
}
This code leads to processing the first elements redundantly, both in the
first set of loops and in the second one. The loop over idx22 could
start from idx12 the first time it is run, but as it has to start from
lower2 for the rest of the runs, this change uses the same bounds for both
set of loops for simplicity. In the rank 1 case, this makes the generated
code worse compared to the inline code that was generated before. A later
change will introduce conditionals to avoid the duplicate processing and
restore the generated code in that case.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
and update all the variables. Put the label and goto in the
outermost scalarizer loop. Don't start the second loop where the
first stopped.
(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
or for any REAL type.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
messages reported by the scalarizer.
* gfortran.dg/maxloc_bounds_6.f90: Ditto.
Mikael Morin [Sat, 21 Sep 2024 16:32:51 +0000 (18:32 +0200)]
fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]
Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
is of integral type, DIM is not present, and MASK is present and is scalar
(only absent MASK or rank 1 ARRAY were inlined before).
Scalar masks are implemented with a wrapping condition around the code one
would generate if MASK wasn't present, so they are easy to support once
inline code without MASK is working.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
variable initialization for each dimension in the else branch of
the toplevel condition.
(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
reported by the scalarizer.
Mikael Morin [Sat, 21 Sep 2024 16:32:44 +0000 (18:32 +0200)]
fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]
Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
if the ARRAY argument is of integral type and of any rank (only the rank 1
case was previously inlined), and neither DIM nor MASK arguments are
present.
This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
mainly to replace the single variables POS and OFFSET, with collections
of variables, one variable per dimension each.
The restriction to integral ARRAY and absent MASK limits the scope of
the change to the cases where we generate single loop inline code. The
code generation for the second loop is only accessible with ARRAY of rank
1, so it can continue using a single variable. A later change will extend
inlining to the double loop cases.
There is some bounds checking code that was previously handled by the
library, and that needed some changes in the scalarizer to avoid regressing.
The bounds check code generation was already supported by the scalarizer,
but it was only applying to array reference sections, checking both
for array bound violation and for shape conformability between all the
involved arrays. With this change, for MINLOC or MAXLOC, enable the
conformability check between all the scalarized arrays, and disable the
array bound violation check.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
result upper bound using the rank of the ARRAY argument. Ajdust
the error message for intrinsic result arrays. Only check array
bounds for array references. Move bound check decision code...
(bounds_check_needed): ... here as a new predicate. Allow bound
check for MINLOC/MAXLOC intrinsic results.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
result array upper bound to the rank of ARRAY. Update the NONEMPTY
variable to depend on the non-empty extent of every dimension. Use
one variable per dimension instead of a single variable for the
position and the offset. Update their declaration, initialization,
and update to affect the variable of each dimension. Use the first
variable only in areas only accessed with rank 1 ARRAY argument.
Set every element of the result using its corresponding variable.
(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
and absent DIM and MASK.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
message emitted by the scalarizer.
Remove the frontend pass rewriting calls of MINLOC/MAXLOC without DIM to
calls with one-valued DIM enclosed in an array constructor. This
transformation was circumventing the limitation of inline MINLOC/MAXLOC code
generation to scalar cases only, allowing inline code to be generated if
ARRAY had rank 1 and DIM was absent. As MINLOC/MAXLOC has gained support of
inline code generation in that case, the limitation is no longer effective,
and the transformation no longer necessary.
gcc/fortran/ChangeLog:
* frontend-passes.cc (optimize_minmaxloc): Remove.
(optimize_expr): Remove dispatch to optimize_minmaxloc.
Mikael Morin [Sat, 21 Sep 2024 16:32:25 +0000 (18:32 +0200)]
fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]
Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the
DIM argument is not present and ARRAY has rank 1. This case is similar to
the case where the result is scalar (DIM present and rank 1 ARRAY), which
already supports inline expansion of the intrinsic. Both cases return
the same value, with the difference that the result is an array of size 1 if
DIM is absent, whereas it's a scalar if DIM is present. So all there is
to do for the new case to work is hook the inline expansion with the
scalarizer.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_ss_startstride): Set the scalarization
rank based on the MINLOC/MAXLOC rank if needed. Call the inline
code generation and setup the scalarizer array descriptor info
in the MINLOC and MAXLOC cases.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the
result array element if the scalarizer is setup and we are inside
the loops. Restrict library function call dispatch to the case
where inline expansion is not supported. Declare an array result
if the expression isn't scalar. Initialize the array result single
element and return the result variable if the expression isn't
scalar.
(walk_inline_intrinsic_minmaxloc): New function.
(walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases,
dispatching to walk_inline_intrinsic_minmaxloc.
(gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases.
(gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1,
regardless of DIM.
Mikael Morin [Sat, 21 Sep 2024 16:32:19 +0000 (18:32 +0200)]
fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
Disable rewriting of MINLOC/MAXLOC expressions for which inline code
generation is supported. Update the gfc_inline_intrinsic_function_p
predicate (already existing) for that, with the current state of
MINLOC/MAXLOC inlining support, that is only the cases of a scalar
result and non-CHARACTER argument for now.
This change has no effect currently, as the MINLOC/MAXLOC front-end passes
only change expressions of rank 1, but the inlining control predicate
gfc_inline_intrinsic_function_p returns false for those. However, later
changes will extend MINLOC/MAXLOC inline expansion support to array
expressions and update the inlining control predicate, and this will become
effective.
PR fortran/90608
gcc/fortran/ChangeLog:
* frontend-passes.cc (optimize_minmaxloc): Skip if we can generate
inline code for the unmodified expression.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Add
MINLOC and MAXLOC cases.
Mikael Morin [Sat, 21 Sep 2024 16:32:10 +0000 (18:32 +0200)]
fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
Add the tests covering the various cases for which we are about to implement
inline expansion of MINLOC and MAXLOC. Those are cases where the DIM
argument is not present.
PR fortran/90608
gcc/testsuite/ChangeLog:
* gfortran.dg/ieee/maxloc_nan_1.f90: New test.
* gfortran.dg/ieee/minloc_nan_1.f90: New test.
* gfortran.dg/maxloc_7.f90: New test.
* gfortran.dg/maxloc_with_mask_1.f90: New test.
* gfortran.dg/minloc_8.f90: New test.
* gfortran.dg/minloc_with_mask_1.f90: New test.
Jason Merrill [Mon, 9 Sep 2024 16:35:37 +0000 (12:35 -0400)]
libstdc++: fix C header include guards
Ever since the c_global and c_compatibility directories were added in
r122533, the include guards have been oddly late in the files, with no
comment about why that might be either in the commit message or the files
themselves. I don't see any justification for this; it seems like a
scripting error in creating these files based on the ones in include/c.
David Malcolm [Fri, 20 Sep 2024 22:51:56 +0000 (18:51 -0400)]
diagnostics: add HTML output format as a plugin [PR116792]
This patch adds an experimental diagnostics output format that
writes HTML. It isn't ready yet for end-users, but seems worth
keeping in the tree as I refactor the diagnostics subsystem, to
ensure that this code still builds, and to verify that it's possible to
implement new diagnostic output formats via GCC plugins. Hence
this patch merely adds it to the testsuite as an example of a GCC
plugin, rather than exposing it as a feature for end-users.
gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/plugin/diagnostic-test-xhtml-1.c: New test.
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: New test plugin.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 20 Sep 2024 22:51:55 +0000 (18:51 -0400)]
analyzer: simplify dumps using tree_dump_pretty_printer [PR116613]
There are numerous "dump" member functions in the analyzer with
copied-and-pasted logic. Simplify them by moving the shared code
to a new class tree_dump_pretty_printer.
As well as reducing code duplication, this eliminates numerous
uses of pp_show_color (global_dc->m_printer), which should
ultimately help with supporting multiple diagnostic sinks.
Add an m_printer to sarif_builder and use throughout, rather than
using the context's printer. For now this is the same printer, but
eventually this should help with transitioning to multiple output sinks.
No functional change intended.
gcc/ChangeLog:
PR other/116613
* diagnostic-format-sarif.cc (sarif_builder::m_printer): New
field.
(sarif_invocation::add_notification_for_ice): Drop context param.
(sarif_invocation::prepare_to_flush): Convert param from context
to builder.
(sarif_result::on_nested_diagnostic): Drop context param. Use
builder's printer.
(sarif_result::on_diagram): Drop context param.
(sarif_ice_notification::sarif_ice_notification): Drop context
param. Use builder's printer.
(sarif_builder::sarif_builder): Initialize m_printer.
(sarif_builder::on_report_diagnostic): Drop context param. Use
builder's printer.
(sarif_builder::emit_diagram): Drop context param.
(sarif_builder::flush_to_object): Use this rather than context
for call to prepare_to_flush.
(sarif_builder::make_result_object): Drop context param. Use
builder's printer.
(sarif_builder::make_reporting_descriptor_object_for_warning):
Drop context param.
(sarif_builder::make_message_object_for_diagram): Likewise.
Use builder's printer.
(sarif_output_format::on_report_diagnostic): Drop context param
from call to sarif_builder::on_report_diagnostic.
(sarif_output_format::on_diagram): Drop context param from call to
sarif_builder::emit_diagram.
* diagnostic.h (diagnostic_conetxt::get_client_data_hooks): Make const.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Fri, 20 Sep 2024 22:51:55 +0000 (18:51 -0400)]
diagnostics: convert text hooks to use diagnostic_text_output_format [PR116613]
The diagnostic_starter and diagnostic_finalizer callbacks and most of
their support subroutines are only used by the "text" output format.
Emphasize this and reduce the binding with diagnostic_context
by renaming the callbacks to add "_text" in their names, and converting
the first param from diagnostic_context * to
diagnostic_text_output_output &. Update the various subroutines used
by diagnostic starter/finalizer callbacks to also take a
diagnostic_text_output_output & rather than a diagnostic_context *.
Move m_includes and m_last_seen from the context to the text output.
Use the text_output's get_printer () rather than the context's
m_printer, which should ease the transition to multiple output sinks.
No functional change intended.
gcc/c-family/ChangeLog:
PR other/116613
* c-opts.cc: Include "diagnostic-format-text.h".
(c_diagnostic_finalizer): Rename to...
(c_diagnostic_text_finalizer): ...this. Convert first param
from diagnostic_context * to diagnostic_text_output_format & and
update accordingly.
(c_common_diagnostics_set_defaults): Update for renamings.
gcc/ChangeLog:
PR other/116613
* coretypes.h (class diagnostic_text_output_format): Add forward
decl.
* diagnostic-format-json.cc
(json_output_format::after_diagnostic): New.
* diagnostic-format-sarif.cc
(sarif_output_format::after_diagnostic): New.
* diagnostic-format-text.cc: Use pragmas to ignore -Wformat-diag.
(diagnostic_text_output_format::~diagnostic_text_output_format):
Use get_printer. Clean up m_includes_seen here, rather than
in ~diagnostic_context.
(diagnostic_text_output_format::on_report_diagnostic): Use
get_printer. Update for callback renamings and pass *this
to them, rather than &m_context.
(diagnostic_text_output_format::after_diagnostic): New.
(diagnostic_text_output_format::includes_seen_p): Move here
from diagnostic_context/diagnostic.cc.
(diagnostic_text_output_format::get_location_text): New.
(maybe_line_and_column): Move here from diagnostic.cc and make
non-static.
(diagnostic_text_output_format::report_current_module): Move
here from diagnostic_context/diagnostic.cc.
(default_diagnostic_text_starter): Move here from diagnostic.cc,
renaming from default_diagnostic_starter.
(default_diagnostic_text_finalizer): Likewise, renaming from
default_diagnostic_finalizer.
* diagnostic-format-text.h
(diagnostic_text_output_format::diagnostic_text_output_format):
Initialize m_last_module and m_includes_seen.
(diagnostic_text_output_format::after_diagnostic): New decl.
(diagnostic_text_output_format::build_prefix): New decl.
(diagnostic_text_output_format::report_current_module): New decl.
(diagnostic_text_output_format::append_note): New decl.
(diagnostic_text_output_format::file_name_as_prefix): New decl.
(diagnostic_text_output_format::print_path): New decl.
(diagnostic_text_output_format::show_column_p): New decl.
(diagnostic_text_output_format::get_location_text): New decl.
(diagnostic_text_output_format::includes_seen_p): New decl.
(diagnostic_text_output_format::show_any_path): New decl.
(diagnostic_text_output_format::m_last_module): New field.
(diagnostic_text_output_format::m_includes_seen): New field.
* diagnostic-format.h
(diagnostic_output_format::after_diagnostic): New vfunc.
(diagnostic_output_format::get_context): New.
(diagnostic_output_format::get_diagram_theme): New.
* diagnostic-macro-unwinding.cc: Include
"diagnostic-format-text.h".
(maybe_unwind_expanded_macro_loc): Convert first param from
diagnostic_context * to diagnostic_text_output_format & and update
accordingly.
(virt_loc_aware_diagnostic_finalizer): Likewise.
* diagnostic-macro-unwinding.h
(virt_loc_aware_diagnostic_finalizer): Likewise.
(maybe_unwind_expanded_macro_loc): Likewise.
* diagnostic-path.cc: Include "diagnostic-format-text.h".
(path_label::path_label): Drop "ctxt" param and add "colorize"
and "allow_emojis" params. Update initializations.
(path_label::get_text): Use m_colorize rather than querying
m_ctxt.m_printer. Use m_allow_emojis rather than querying
m_ctxt's diagram theme.
(path_label::m_ctxt): Drop field.
(path_label::m_colorize): Drop field.
(path_label::m_allow_emojis): Drop field.
(event_range::event_range): Drop param "ctxt". Add params
"colorize_labels" and "allow_emojis".
(event_range::print): Convert first param from
diagnostic_context & to diagnostic_text_output_format & and update
accordingly.
(path_summary::path_summary): Likewise.
(path_summary::print_swimlane_for_event_range): Likewise.
(print_path_summary_as_text): Likewise for 3rd param.
(diagnostic_context::print_path): Convert to...
(diagnostic_text_output_format::print_path): ...this.
(selftest::test_empty_path): Update to use a
diagnostic_text_output_format.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
* diagnostic.cc (file_name_as_prefix): Convert to...
(diagnostic_text_output_format::file_name_as_prefix): ...this.
(diagnostic_context::initialize): Update for renamings.
Move m_last_module and m_includes_seen into text output.
(diagnostic_context::finish): Likewise.
(diagnostic_context::get_location_text): Add "colorize" param.
(diagnostic_build_prefix): Convert to...
(diagnostic_text_output_format::build_prefix): ...this.
(diagnostic_context::includes_seen_p): Move from here to
diagnostic_text_output_format/diagnostic-format-text.cc.
(diagnostic_context::report_current_module): Likewise.
(diagnostic_context::show_any_path): Convert to...
(diagnostic_text_output_format::show_any_path): ...this.
(default_diagnostic_starter): Rename and move to
diagnostic-format-text.cc.
(default_diagnostic_start_span_fn): Pass colorize bool
to get_location_text.
(default_diagnostic_finalizer): Rename and move to
diagnostic-format-text.cc.
(diagnostic_context::report_diagnostic): Replace call to
show_any_path with call to new output format "after_diagnostic"
vfunc, moving show_any_path call to the text output format.
(diagnostic_append_note): Convert to...
(diagnostic_text_output_format::append_note): ...this.
(selftest::assert_location_text): Pass in false for colorization.
* diagnostic.h (diagnostic_starter_fn): Rename to...
(diagnostic_text_starter_fn): ...this. Convert first param from
diagnostic_context * to diagnostic_text_output_format &.
(diagnostic_finalizer_fn, diagnostic_text_finalizer_fn): Likewise.
(diagnostic_context): Update friends for renamings.
(diagnostic_context::report_current_module): Move to text output
format.
(diagnostic_context::get_location_text): Add "colorize" bool.
(diagnostic_context::includes_seen_p): Move to text output format.
(diagnostic_context::show_any_path): Likewise.
(diagnostic_context::print_path): Likewise.
(diagnostic_context::m_text_callbacks): Update for renamings.
(diagnostic_context::m_last_module): Move to text output format.
(diagnostic_context::m_includes_seen): Likewise.
(diagnostic_starter): Rename to...
(diagnostic_text_starter): ...this and update return type.
(diagnostic_finalizer): Rename to...
(diagnostic_text_finalizer): ...this and update return type.
(diagnostic_report_current_module): Drop decl in favor of a member
function of diagnostic_text_output_format.
(diagnostic_append_note): Likewise.
(default_diagnostic_starter): Rename to...
(default_diagnostic_text_starter): ...this, updating type.
(default_diagnostic_finalizer): Rename to...
(default_diagnostic_text_finalizer): ...this, updating type.
(file_name_as_prefix): Drop decl.
* langhooks-def.h (lhd_print_error_function): Convert first param
from diagnostic_context * to diagnostic_text_output_format &.
* langhooks.cc: Include "diagnostic-format-text.h".
(lhd_print_error_function): Likewise. Update accordingly
* langhooks.h (lang_hooks::print_error_function): Convert first
param from diagnostic_context * to
diagnostic_text_output_format &.
* tree-diagnostic.cc: Include "diagnostic-format-text.h".
(diagnostic_report_current_function): Convert first param from
diagnostic_context * to diagnostic_text_output_format & and update
accordingly.
(default_tree_diagnostic_starter): Rename to...
(default_tree_diagnostic_text_starter): ...this. Convert first
param from diagnostic_context * to diagnostic_text_output_format &
and update accordingly.
(tree_diagnostics_defaults): Update for renamings.
gcc/cp/ChangeLog:
PR other/116613
* cp-tree.h (cxx_print_error_function): Convert first param
from diagnostic_context * to diagnostic_text_output_format &.
* error.cc: Include "diagnostic-format-text.h".
(cxx_initialize_diagnostics): Update for renamings.
(cxx_print_error_function): Convert first param from
diagnostic_context * to diagnostic_text_output_format & and update
accordingly
(cp_diagnostic_starter): Rename to...
(cp_diagnostic_text_starter): ...this. Convert first
param from diagnostic_context * to diagnostic_text_output_format &
and update accordingly.
(cp_print_error_function): Likewise.
(print_instantiation_full_context): Likewise.
(print_instantiation_partial_context_line): Likewise.
(print_instantiation_partial_context): Likewise.
(maybe_print_instantiation_context): Likewise.
(maybe_print_constexpr_context): Likewise.
(print_location): Likewise.
(print_constrained_decl_info): Likewise.
(print_concept_check_info): Likewise.
(print_constraint_context_head): Likewise.
(print_requires_expression_info): Likewise.
(maybe_print_single_constraint_context): Likewise.
gcc/fortran/ChangeLog:
PR other/116613
* error.cc: Include "diagnostic-format-text.h".
(gfc_diagnostic_starter): Rename to...
(gfc_diagnostic_text_starter): ...this. Convert first
param from diagnostic_context * to diagnostic_text_output_format &
and update accordingly.
(gfc_diagnostic_finalizer, gfc_diagnostic_text_finalizer):
Likewise.
(gfc_diagnostics_init): Update for renamings.
(gfc_diagnostics_finish): Likewise.
gcc/jit/ChangeLog:
PR other/116613
* dummy-frontend.cc: Include "diagnostic-format-text.h".
(jit_begin_diagnostic): Convert first param from
diagnostic_context * to diagnostic_text_output_format &
(jit_end_diagnostic): Likewise. Update accordingly.
(jit_langhook_init): Update for renamings.
gcc/rust/ChangeLog:
PR other/116613
* resolve/rust-ast-resolve-expr.cc
(funny_ice_finalizer): : Convert first param from
diagnostic_context * to diagnostic_text_output_format &.
(ResolveExpr::visit): Update for renaming.
gcc/testsuite/ChangeLog:
PR other/116613
* g++.dg/plugin/show_template_tree_color_plugin.c
(noop_starter_fn): Rename to...
(noop_text_starter_fn): ...this. Update first param from dc to
text_output.
(plugin_init): Update for renamings.
* gcc.dg/plugin/diagnostic_group_plugin.c
(test_diagnostic_starter): Rename to...
(test_diagnostic_text_starter): ...this. Update first param from
dc to text_output.
(plugin_init): Update for renaming.
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c: Include
"diagnostic-format-text.h".
(custom_diagnostic_finalizer): Rename to...
(custom_diagnostic_text_finalizer): ...this. Update first param
from dc to text_output.
(test_show_locus): Update for renamings.
* gcc.dg/plugin/location_overflow_plugin.c: Include
"diagnostic-format-text.h".
(original_finalizer): Rename to...
(original_text_finalizer): ...this and update type.
(verify_unpacked_ranges): Update first param from dc to
text_output. Update for this and for renamings.
(verify_no_columns): Likewise.
(plugin_init): Update for renamings.
libcc1/ChangeLog:
PR other/116613
* context.cc: Include "diagnostic-format-text.h".
(plugin_print_error_function): Update first param from
diagnostic_context * to diagnostic_text_output_format &.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jonathan Wakely [Thu, 29 Aug 2024 12:47:15 +0000 (13:47 +0100)]
libstdc++: Avoid forming T* in unique_ptr(auto_ptr<U>&&) constraints [PR116529]
PR 116529 shows that std::unique_ptr<X&, D> is currently unusable
because the constructor taking std::auto_ptr (which is a non-standard
extension since C++17) tries to form the invalid type X&* during
overload resolution. We can use the `pointer` type in the constructor
constraints, instead of trying to form an invalid type. The
std::auto_ptr constructor can never actually match for the case where
element_type is a reference, so we just need it to produce a
substitution failure instead of being ill-formed.
LWG 4144 might make std::unique_ptr<X&, D> ill-formed, which would
invalidate this new test. We would have to remove this test in that
case. Using `pointer` in the constructor from std::auto_ptr would not be
needed to support the std::unique_ptr<X&, D> case, but would not cause
any harm either.
libstdc++-v3/ChangeLog:
PR libstdc++/116529
* include/bits/unique_ptr.h (unique_ptr(auto_ptr<U>&&)):
Use pointer instead of T*.
* testsuite/20_util/unique_ptr/creation/116529.cc: New test.
Jonathan Wakely [Fri, 20 Sep 2024 16:35:48 +0000 (17:35 +0100)]
libstdc++: Document missing features for old std:string ABI [PR116777]
There are several features that are not supported when using the old
std::string ABI. It's possible that PR 81967 will get fixed, but the
missing C++20 features almost certainly won't be. Document this in the
manual.
libstdc++-v3/ChangeLog:
PR libstdc++/116777
* doc/xml/manual/using.xml: Document features that are not
supported for the gcc4-compatible ABI.
* doc/html/manual/using_dual_abi.html: Regenerate.
Martin Uecker [Tue, 17 Sep 2024 09:37:29 +0000 (11:37 +0200)]
c: fix crash when checking for compatibility of structures [PR116726]
When checking for compatibility of structure or union types in
tagged_types_tu_compatible_p, restore the old value of the pointer to
the top of the temporary cache after recursively calling comptypes_internal
when looping over the members of a structure of union. While the next
iteration of the loop overwrites the pointer, I missed the fact that it can
be accessed again when types of function arguments are compared as part
of recursive type checking and the function is entered again.
PR c/116726
gcc/c/ChangeLog:
* c-typeck.cc (tagged_types_tu_compatible_p): Restore value
of the cache after recursing into comptypes_internal.
Patrick Palka [Fri, 20 Sep 2024 19:41:42 +0000 (15:41 -0400)]
c++: CWG 2789 and reversed operator candidates
As a follow-up to r15-3741-gee3efe06c9c49c, which was specifically
concerned with usings, it seems the CWG 2789 refinement should also
compare contexts of a reversed vs non-reversed (member) candidate
during operator overload resolution.
DR 2789
gcc/cp/ChangeLog:
* call.cc (cand_parms_match): Check for matching class contexts
even in the reversed case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memfun4.C: Adjust expected result
involving reversed candidate.
Gaius Mulley [Fri, 20 Sep 2024 18:05:16 +0000 (19:05 +0100)]
modula2: Remove unused parameter warnings seen in build
This patch removes unused parameters in gm2-compiler/M2Check.mod.
It also removes a --fixme-- and completes the missing code
which type checks unbounded arrays. The patch also fixes a
build error seen when building m2/stage2/cc1gm2.
gcc/m2/ChangeLog:
* gm2-compiler/M2Check.mod (checkUnboundedArray): New
procedure function.
(checkUnboundedUnbounded): Ditto.
(checkUnbounded): Rewrite to check the unbounded data
type.
(checkPair): Add comment.
(doCheckPair): Add comment.
Remove tinfo parameter from the call to checkTypeKindViolation.
(checkTypeKindViolation): Remove ununsed parameter tinfo.
* gm2-libs-ch/UnixArgs.cc (GM2RTS.h): Remove include.
* gm2-libs-ch/m2rts.h (M2RTS_INIT): New define.
(M2RTS_DEP): Ditto.
(M2RTS_RegisterModule): New prototype.
(GM2RTS.h): Add include to the MC_M2 block.
gcc/testsuite/ChangeLog:
* gm2/iso/fail/testarrayunbounded2.mod: New test.
* gm2/iso/fail/testarrayunbounded3.mod: New test.
* gm2/iso/fail/testarrayunbounded4.mod: New test.
* gm2/iso/fail/testarrayunbounded5.mod: New test.
* gm2/iso/fail/testarrayunbounded6.mod: New test.
* gm2/iso/pass/testarrayunbounded.mod: New test.
Patrick Palka [Fri, 20 Sep 2024 16:33:13 +0000 (12:33 -0400)]
c++: CWG 2789 and usings [PR116492]
After CWG 2789, the "more constrained" tiebreaker for non-template
functions should exclude member functions that are defined in
different classes. This patch implements this missing refinement.
In turn we can get rid of four-parameter version of object_parms_correspond
and call the main overload directly since now correspondence is only
only checked for members from the same class.
PR c++/116492
DR 2789
gcc/cp/ChangeLog:
* call.cc (object_parms_correspond): Remove.
(cand_parms_match): Return false for member functions that come
from different classes. Adjust call to object_parms_correspond.
(joust): Update comment for the non-template "more constrained"
case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-memfun4.C: Also compile in C++20 mode.
Expect ambiguity when candidates come from different classes.
* g++.dg/cpp2a/concepts-inherit-ctor12.C: New test.
Patrick Palka [Fri, 20 Sep 2024 16:31:40 +0000 (12:31 -0400)]
c++: CWG 2273 and non-constructors
Our implementation of the CWG 2273 inheritedness tiebreaker seems to be
incorrectly considering all member functions introduced via using, not
just constructors. This patch restricts the tiebreaker accordingly.
DR 2273
gcc/cp/ChangeLog:
* call.cc (joust): Restrict inheritedness tiebreaker to
constructors.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/using1.C: Expect ambiguity for non-constructor call.
* g++.dg/overload/using5.C: Likewise.
Darwin: Allow for as versions that need '-' for std in.
Recent versions of Xcode as require a dash to read from standard
input. We can use this on all supported OS versions so make it
unconditional. Patch from Mark Mentovai.
Richard Biener [Fri, 20 Sep 2024 10:17:22 +0000 (12:17 +0200)]
Fall back to elementwise access for too spaced SLP single element interleaving
gcc.dg/vect/vect-pr111779.c is a case where non-SLP manages to vectorize
using VMAT_ELEMENTWISE but SLP currently refuses because doing a regular
access with permutes would cause excess vector loads with at most one
element used. The following makes us fall back to elementwise accesses
for that, too.
* tree-vect-stmts.cc (get_group_load_store_type): Fall back
to VMAT_ELEMENTWISE when single element interleaving of
a too large group.
(vectorizable_load): Do not try to verify load permutations
when using VMAT_ELEMENTWISE for single-lane SLP and fix code
generation for this case.
* gfortran.dg/vect/vect-8.f90: Allow one more vectorized loop.
Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Add
get_device_from_uid and omp_get_uid_from_device routines.
Jakub Jelinek [Fri, 20 Sep 2024 07:14:29 +0000 (09:14 +0200)]
i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]
min/max patterns for intrinsics which on x86 result in the second
input operand if the two operands are both zeros or one or both of them
are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.
The following patch adds an expander which uses either a new pattern with
UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.
2024-09-20 Uros Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/116738
* config/i386/subst.md (mask_scalar_operand_arg34,
mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
subst attributes.
* config/i386/sse.md
(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
Change from define_insn to define_expand, rename the old define_insn
to ...
(*<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
... this.
(<sse>_ieee_vm<ieee_maxmin><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
New define_insn.
Richard Biener [Fri, 20 Sep 2024 06:53:53 +0000 (08:53 +0200)]
testsuite/116784 - match up SLP scan and vectorized scan
The test used vect_perm_short for the vectorized scanning but
vect_perm3_short for whether that's done with SLP. We're now
generally expecting SLP to be used - even as fallback, so the
following adjusts both to match up, fixing the powerpc64 reported
testsuite issue.
PR testsuite/116784
* gcc.dg/vect/slp-perm-9.c: Use vect_perm_short also for
the SLP check.
Andrew Pinski [Thu, 19 Sep 2024 23:32:50 +0000 (16:32 -0700)]
Remove PHI_RESULT_PTR and change some PHI_RESULT to be gimple_phi_result [PR116643]
There was only a few uses PHI_RESULT_PTR so lets remove it and use gimple_phi_result_ptr
or gimple_phi_result directly instead.
Since I was modifying ssa-iterators.h for the use of PHI_RESULT_PTR, change the use
of PHI_RESULT there to be gimple_phi_result instead.
This also removes one extra indirection that was done for PHI_RESULT so stage2 building
should be slightly faster.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/116643
gcc/ChangeLog:
* ssa-iterators.h (single_phi_def): Use gimple_phi_result
instead of PHI_RESULT.
(op_iter_init_phidef): Use gimple_phi_result/gimple_phi_result_ptr
instead of PHI_RESULT/PHI_RESULT_PTR.
* tree-ssa-operands.h (PHI_RESULT_PTR): Remove.
(PHI_RESULT): Use gimple_phi_result directly.
(SET_PHI_RESULT): Use gimple_phi_result_ptr directly.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
This PR points out the we're not implementing [dcl.fct.def.default]
properly. Consider e.g.
struct C {
C(const C&&) = default;
};
where we wrongly emit an error, but the move ctor should be just =deleted.
According to [dcl.fct.def.default], if the type of the special member
function differs from the type of the corresponding special member function
that would have been implicitly declared in a way other than as allowed
by 2.1-4, the function is defined as deleted. There's an exception for
assignment operators in which case the program is ill-formed.
clang++ has a warning for when we delete an explicitly-defaulted function
so this patch adds it too.
When the code is ill-formed, we emit an error in all modes. Otherwise,
we emit a pedwarn in C++17 and a warning in C++20.
PR c++/116162
gcc/c-family/ChangeLog:
* c.opt (Wdefaulted-function-deleted): New.
gcc/cp/ChangeLog:
* class.cc (check_bases_and_members): Don't set DECL_DELETED_FN here,
leave it to defaulted_late_check.
* cp-tree.h (maybe_delete_defaulted_fn): Declare.
(defaulted_late_check): Add a tristate parameter.
* method.cc (maybe_delete_defaulted_fn): New.
(defaulted_late_check): Add a tristate parameter. Call
maybe_delete_defaulted_fn instead of giving an error.
Jakub Jelinek [Thu, 19 Sep 2024 15:53:27 +0000 (17:53 +0200)]
dwarf2asm: Use constexpr for eh_data_format_name initialization for C++14
Similarly to the previous patch, dwarf2asm.cc had
HAVE_DESIGNATED_INITIALIZERS support, and as fallback a huge switch.
The switch from what I can see is expanded as a jump table with 256
label pointers and code at those labels then loads addresses of
string literals.
The following patch instead uses a table with 256 const char * pointers,
NULL for ICE, non-NULL for returning something, similarly to the
HAVE_DESIGNATED_INITIALIZERS case.
2024-09-19 Jakub Jelinek <jakub@redhat.com>
* dwarf2asm.cc (eh_data_format_name): Use constexpr initialization
of format_names table for C++14 instead of a large switch.
The below test are passed for this patch.
* The rv64gcv fully regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-5.c: New test.
* gcc.target/riscv/sat_s_add-6.c: New test.
* gcc.target/riscv/sat_s_add-7.c: New test.
* gcc.target/riscv/sat_s_add-8.c: New test.
* gcc.target/riscv/sat_s_add-run-5.c: New test.
* gcc.target/riscv/sat_s_add-run-6.c: New test.
* gcc.target/riscv/sat_s_add-run-7.c: New test.
* gcc.target/riscv/sat_s_add-run-8.c: New test.
The following reverts a bogus fix done for PR101009 and instead makes
sure we get into the same_access_functions () case when computing
the distance vector for g[1] and g[1] where the constants ended up
having different types. The generic code doesn't seem to handle
loop invariant dependences. The special case gets us both
( 0 ) and ( 1 ) as distance vectors while formerly we got ( 1 ),
which the PR101009 fix changed to ( 0 ) with bad effects on other
cases as shown in this PR.
PR tree-optimization/116768
* tree-data-ref.cc (build_classic_dist_vector_1): Revert
PR101009 change.
* tree-chrec.cc (eq_evolutions_p): Make sure (sizetype)1
and (int)1 compare equal.
Richard Biener [Wed, 18 Sep 2024 10:41:25 +0000 (12:41 +0200)]
Fall back to single-lane SLP before falling back to no SLP
The following changes the fallback to disable SLP when any of the
discovered SLP instances failed to pass vectorization checking into
a fallback that emulates what no SLP would do with SLP - force
single-lane discovery for all instances.
The patch does not remove the final fallback to disable SLP but it
reduces the fallout from failing vectorization when any non-SLP
stmt survives analysis.
* tree-vectorizer.h (vect_analyze_slp): Add force_single_lane
parameter.
* tree-vect-slp.cc (vect_analyze_slp_instance): Remove
defaulting of force_single_lane.
(vect_build_slp_instance): Likewise. Pass down appropriate
force_single_lane.
(vect_analyze_slp): Add force_sigle_lane parameter and pass
it down appropriately.
(vect_slp_analyze_bb_1): Always do multi-lane SLP.
* tree-vect-loop.cc (vect_analyze_loop_2): Track two SLP
modes and adjust accordingly.
(vect_analyze_loop_1): Save the SLP mode when unrolling.
Jason Merrill [Fri, 22 Dec 2023 18:20:35 +0000 (13:20 -0500)]
libstdc++: add #pragma diagnostic
The use of #pragma GCC system_header in libstdc++ has led to bugs going
undetected for a while due to the silencing of compiler warnings that would
have revealed them promptly, and also interferes with warnings about
problematic template instantiations induced by user code.
But removing it, or even compiling with -Wsystem-header, is also problematic
due to warnings about deliberate uses of extensions.
So this patch adds #pragma GCC diagnostic as needed to suppress these
warnings.
The change to acinclude.m4 changes -Wabi to warn only in comparison to ABI
19, to avoid lots of warnings that we now mangle concept requirements, which
are in any case still experimental. I checked for any other changes against
ABI v15, and found only the <format> lambda mangling, which we can ignore.
This also enables -Wsystem-headers while building the library, so we see any
warnings not silenced by these #pragmas.
Richard Biener [Thu, 19 Sep 2024 10:37:13 +0000 (12:37 +0200)]
Always dump generated distance vectors
There's special-casing for equal access functions which bypasses
printing the distance vectors. The following makes sure we print
them always which helps debugging.
* tree-data-ref.cc (build_classic_dist_vector): Move
distance vector dumping to single caller ...
(subscript_dependence_tester): ... here, dumping always
when we succeed computing it.
Richard Biener [Tue, 17 Sep 2024 09:20:10 +0000 (11:20 +0200)]
tree-optimization/116573 - .SELECT_VL for SLP
The following restores the use of .SELECT_VL for testcases where it
is safe to use even when using SLP. I've for now restricted it
to single-lane SLP plus optimistically allow store-lane nodes
and assume single-lane roots are not widened but at most to
load-lane who should be fine.
PR tree-optimization/116573
* tree-vect-loop.cc (vect_analyze_loop_2): Allow .SELECV_VL
for SLP but disable it when there's multi-lane instances.
* tree-vect-stmts.cc (vectorizable_store): Only compute the
ptr increment when generating code.
(vectorizable_load): Likewise.
Pan Li [Wed, 11 Sep 2024 01:34:21 +0000 (09:34 +0800)]
Genmatch: Refine the gen_phi_on_cond by match_cond_with_binary_phi
This patch would like to leverage the match_cond_with_binary_phi to
match the phi on cond, and get the true/false arg if matched. This
helps a lot to simplify the implementation of gen_phi_on_cond.
Fix deep copy allocatable components in coarrays. [PR85002]
Fix code for deep copy of allocatable components in derived type nested
structures generated, but not inserted when the copy had to be done in
a coarray. Additionally fix a comment.
gcc/fortran/ChangeLog:
PR fortran/85002
* trans-array.cc (duplicate_allocatable_coarray): Allow adding
of deep copy code in the when-allocated case. Add bounds
computation before condition, because coarrays need the bounds
also when not allocated.
(structure_alloc_comps): Duplication in the coarray case is done
already, omit it. Add the deep-code when duplication a coarray.
* trans-expr.cc (gfc_trans_structure_assign): Fix comment.
Jennifer Schmitz [Tue, 17 Sep 2024 07:15:38 +0000 (00:15 -0700)]
SVE intrinsics: Fold svmul with all-zero operands to zero vector
As recently implemented for svdiv, this patch folds svmul to a zero
vector if one of the operands is a zero vector. This transformation is
applied if at least one of the following conditions is met:
- the first operand is all zeros or
- the second operand is all zeros, and the predicate is ptrue or the
predication is _x or _z.
In contrast to constant folding, which was implemented in a previous
patch, this transformation is applied as soon as one of the operands is
a zero vector, while the other operand can be a variable.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
Add folding of all-zero operands to zero vector.
gcc/testsuite/
* gcc.target/aarch64/sve/const_fold_mul_1.c: Adjust expected
outcome.
* gcc.target/aarch64/sve/fold_mul_zero.c: New test.
aarch64: Define l1_cache_line_size for -mcpu=neoverse-v2
This is a small patch that sets the L1 cache line size for Neoverse V2.
Unlike the other cache-related constants in there this value is not used just
for SW prefetch generation (which we want to avoid for Neoverse V2 presently).
It's also used to set std::hardware_destructive_interference_size.
See the links and recent discussions in PR116662 for reference.
Some CPU tunings in aarch64 set this value to something useful, but for
generic tuning we use the conservative 256, which forces 256-byte alignment
in such atomic structures. Using a smaller value can decrease the size of such
structs during layout and should not present an ABI problem as
std::hardware_destructive_interference_size is not intended to be used for structs
in an external interface, and GCC warns about such uses.
Another place where the L1 cache line size is used is in phiopt for
-fhoist-adjacent-loads where conditional accesses to adjacent struct members
can be speculatively loaded as long as they are within the same L1 cache line.
e.g.
struct S { int i; int j; };
int
bar (struct S *x, int y)
{
int r;
if (y)
r = x->i;
else
r = x->j;
return r;
}
The Neoverse V2 L1 cache line is 64 bytes according to the TRM, so set it to
that. The rest of the prefetch parameters inherit from the generic tuning so
we don't do anything extra for software prefeteches.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
* config/aarch64/tuning_models/neoversev2.h (neoversev2_prefetch_tune):
Define.
(neoversev2_tunings): Use it.