Kaz Kojima [Tue, 24 Sep 2024 09:26:42 +0000 (18:26 +0900)]
SH: pin input args to hard-regs via predicates for sfuncs
Some sfuncs uses hard reg as input and clobber its raw reg pattern. It
seems that LRA doesn't process this clobber pattern. Rewrite these
patterns so as to work with LRA.
gcc/ChangeLog:
* config/sh/predicates.md (hard_reg_r4, hard_reg_r5,
hard_reg_r6): New predicates.
* config/sh/sh.md (udivsi3_i4, udivsi3_i4_single,
udivsi3_i1): Rewrite with match_operand and match_dup.
(block_lump_real, block_lump_real_i4): Ditto.
(udivsi3): Adjust for it.
* config/sh/sh-mem.cc (expand_block_move): Ditto.
Kaz Kojima [Fri, 20 Sep 2024 09:17:31 +0000 (18:17 +0900)]
SH: try to workaround fp-reg related move insns
LRA will try to satisfy the constraints in match_scratch for the memory
displacements and it will make issues on this target. To mitigate the
issue, split movsf_ie_ra into several new patterns to remove
match_scratch. Also define a new sub-pattern of movdf for constant
loads.
gcc/ChangeLog:
* gcc/config/sh/predicates.md (pc_relative_load_operand):
New predicate.
* gcc/config/sh/sh-protos.h (sh_movsf_ie_ra_split_p): Remove.
(sh_movsf_ie_y_split_p): New proto.
* gcc/config/sh/sh.cc: (sh_movsf_ie_ra_split_p): Remove.
(sh_movsf_ie_y_split_p): New function.
(broken_move): Take movsf_ie_ra into account for fldi cases.
* gcc/config/sh/sh.md (movdf_i4_F_z): New insn pattern.
(movdf): Use it.
(movsf_ie_ra): Use define_insn instead of define_insn_and_split.
(movsf_ie_F_z, movsf_ie_Q_z, movsf_ie_y): New insn pattern.
(movsf): Use new patterns.
(movsf-1): Don't split when operands[0] or operands[1]
is fpul.
(movdf_i4_F_z+7): New splitter.
Kaz Kojima [Fri, 20 Sep 2024 09:15:30 +0000 (18:15 +0900)]
SH: Try to reduce R0 live ranges
Some move or extend patterns will make long R0 live ranges and could
confuse LRA.
gcc/ChangeLog:
* config/sh/sh-protos.h
(sh_satisfies_constraint_Sid_subreg_index): Declare.
* config/sh/sh.cc (sh_satisfies_constraint_Sid_subreg_index):
New function.
* config/sh/sh.md (extend<mode>si2_short_mem_disp_z,
*mov<mode>_store_mem_index, mov<mode>_store_mem_index):
New insn and insn_and_split patterns.
(extend<mode>si2, mov<mode>): Use them for LRA.
Thomas Koenig [Tue, 24 Sep 2024 20:57:42 +0000 (22:57 +0200)]
Add random numbers and fix some bugs.
This patch adds random number support for UNSIGNED, plus fixes
two bugs, with array I/O where the type used to be set to BT_INTEGER,
and for division with the divisor being a constant.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_random_number): Adjust for unsigned.
* iresolve.cc (gfc_resolve_random_number): Handle unsigned.
* trans-expr.cc (gfc_conv_expr_op): Handle BT_UNSIGNED for divide.
* trans-types.cc (gfc_get_dtype_rank_type): Handle BT_UNSIGNED.
* gfortran.texi: Add RANDOM_NUMBER for UNSIGNED.
libgfortran/ChangeLog:
* gfortran.map: Add _gfortran_random_m1, _gfortran_random_m2,
_gfortran_random_m4, _gfortran_random_m8 and _gfortran_random_m16.
* intrinsics/random.c (random_m1): New function.
(random_m2): New function.
(random_m4): New function.
(random_m8): New function.
(random_m16): New function.
(arandom_m1): New function.
(arandom_m2): New function.
(arandom_m4): New function.
(arandom_m8): New funciton.
(arandom_m16): New function.
Thomas Koenig [Tue, 24 Sep 2024 20:53:59 +0000 (22:53 +0200)]
Implement IANY, IALL and IPARITY for unsigned.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_transf_bit_intrins): Handle unsigned.
* gfortran.texi: Docment IANY, IALL and IPARITY for unsigned.
* iresolve.cc (gfc_resolve_iall): Set flag to use integer
if type is BT_UNSIGNED.
(gfc_resolve_iany): Likewise.
(gfc_resolve_iparity): Likewise.
* simplify.cc (do_bit_and): Adjust asserts for BT_UNSIGNED.
(do_bit_ior): Likewise.
(do_bit_xor): Likewise
Thomas Koenig [Tue, 24 Sep 2024 19:59:10 +0000 (21:59 +0200)]
Implement SUM and PRODUCT for unsigned.
gcc/fortran/ChangeLog:
* gfortran.texi: Document SUM and PRODUCT.
* iresolve.cc (resolve_transformational): New argument,
use_integer, to translate calls to unsigned to calls to
integer.
(gfc_resolve_product): Use it
(gfc_resolve_sum): Use it.
* simplify.cc (init_result_expr): Handle BT_UNSIGNED.
Thomas Koenig [Tue, 24 Sep 2024 19:51:42 +0000 (21:51 +0200)]
Implement MATMUL and DOT_PRODUCT for unsigned.
gcc/fortran/ChangeLog:
* arith.cc (gfc_arith_uminus): Fix warning.
(gfc_arith_minus): Correctly truncate unsigneds.
* check.cc (gfc_check_dot_product): Handle unsigned arguments.
(gfc_check_matmul): Likewise.
* expr.cc (gfc_get_unsigned_expr): New function.
* gfortran.h (gfc_get_unsigned_expr): Add prototype.
* iresolve.cc (gfc_resolve_matmul): If using UNSIGNED, use the
signed integer version.
* gfortran.texi: Document MATMUL and DOT_PRODUCT for unsigned.
* simplify.cc (compute_dot_product): Handle unsigneds.
libgfortran/ChangeLog:
* m4/iparm.m4: Add UNSIGED if type is m.
* m4/matmul.m4: If type is GFC_INTEGER, use GFC_UINTEGER instead.
Whitespace fixes.
* m4/matmul_internal.m4: Whitespace fixes.
Jakub Jelinek [Tue, 24 Sep 2024 18:19:50 +0000 (20:19 +0200)]
c++: Implement C++23 P2718R0 - Wording for P2644R1 Fix for Range-based for Loop [PR107637]
The following patch implements the C++23 P2718R0 paper
- Wording for P2644R1 Fix for Range-based for Loop.
The patch introduces a new option, -f{,no-}range-for-ext-temps so that
user can control the behavior even in older C++ versions.
The option is on by default in C++23 and later (-fno-range-for-ext-temps
is an error in that case) and in the -std=gnu++11 ... -std=gnu++20 modes
(one can use -fno-range-for-ext-temps to request previous behavior in that
case), and is not enabled by default in -std=c++11 ... -std=c++20 modes
but one can explicitly enable it with -frange-for-ext-temps.
As all the temporaries from __for_range initialization should have life
extended until the end of __for_range scope, this patch disables (for
-frange-for-ext-temps and if !processing_template_decl) CLEANUP_POINT_EXPR wrapping
of the __for_range declaration, also disables -Wdangling-reference warning
as well as the rest of extend_ref_init_temps (we know the __for_range temporary
is not TREE_STATIC and as all the temporaries from the initializer will be life
extended, we shouldn't try to handle temporaries referenced by references any
differently) and adds an extra push_stmt_list/pop_stmt_list before
cp_finish_decl of __for_range and after end of the for body and wraps all
that into CLEANUP_POINT_EXPR.
I had to repeat that also for OpenMP range loops because those are handled
differently.
2024-09-24 Jakub Jelinek <jakub@redhat.com>
PR c++/107637
gcc/
* omp-general.cc (find_combined_omp_for, find_nested_loop_xform):
Handle CLEANUP_POINT_EXPR like TRY_FINALLY_EXPR.
* doc/invoke.texi (frange-for-ext-temps): Document. Add
-fconcepts to the C++ option list.
gcc/c-family/
* c.opt (frange-for-ext-temps): New option.
* c-opts.cc (c_common_post_options): Set flag_range_for_ext_temps
for C++23 or later or for C++11 or later in !flag_iso mode if
the option wasn't set by user.
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_range_based_for
value for flag_range_for_ext_temps from 201603L to 202212L in C++17
or later.
* c-omp.cc (c_find_nested_loop_xform_r): Handle CLEANUP_POINT_EXPR
like TRY_FINALLY_EXPR.
gcc/cp/
* cp-tree.h: Implement C++23 P2718R0 - Wording for P2644R1 Fix for
Range-based for Loop.
(cp_convert_omp_range_for): Add bool tmpl_p argument.
(find_range_for_decls): Declare.
* parser.cc (cp_convert_range_for): For flag_range_for_ext_temps call
push_stmt_list () before cp_finish_decl for range_temp and save it
temporarily to FOR_INIT_STMT.
(cp_convert_omp_range_for): Add tmpl_p argument. If set, remember
DECL_NAME of range_temp and for cp_finish_decl call restore it before
clearing it again, if unset, don't adjust DECL_NAME of range_temp at
all.
(cp_parser_omp_loop_nest): For flag_range_for_ext_temps range for add
CLEANUP_POINT_EXPR around sl. Call find_range_for_decls and adjust
DECL_NAMEs for range fors if not processing_template_decl. Adjust
cp_convert_omp_range_for caller. Remove superfluous backslash at the
end of line.
* decl.cc (initialize_local_var): For flag_range_for_ext_temps
temporarily clear stmts_are_full_exprs_p rather than set for
for_range__identifier decls.
* call.cc (extend_ref_init_temps): For flag_range_for_ext_temps return
init early for for_range__identifier decls.
* semantics.cc (find_range_for_decls): New function.
(finish_for_stmt): Use it. For flag_range_for_ext_temps if
cp_convert_range_for set FOR_INIT_STMT, pop_stmt_list it and wrap
into CLEANUP_POINT_EXPR.
* pt.cc (tsubst_omp_for_iterator): Adjust tsubst_omp_for_iterator
caller.
(tsubst_stmt) <case OMP_FOR>: For flag_range_for_ext_temps if there
are any range fors in the loop nest, add push_stmt_list starting
before the initializations, pop_stmt_list it after the body and wrap
into CLEANUP_POINT_EXPR. Change DECL_NAME of range for temps from
NULL to for_range_identifier.
gcc/testsuite/
* g++.dg/cpp23/range-for1.C: New test.
* g++.dg/cpp23/range-for2.C: New test.
* g++.dg/cpp23/range-for3.C: New test.
* g++.dg/cpp23/range-for4.C: New test.
* g++.dg/cpp23/range-for5.C: New test.
* g++.dg/cpp23/range-for6.C: New test.
* g++.dg/cpp23/range-for7.C: New test.
* g++.dg/cpp23/range-for8.C: New test.
* g++.dg/cpp23/feat-cxx2b.C (__cpp_range_based_for): Check for
202212L rather than 201603L.
* g++.dg/cpp26/feat-cxx26.C (__cpp_range_based_for): Likewise.
* g++.dg/warn/Wdangling-reference4.C: Don't expect warning for C++23
or newer. Use dg-additional-options rather than dg-options.
libgomp/
* testsuite/libgomp.c++/range-for-1.C: New test.
* testsuite/libgomp.c++/range-for-2.C: New test.
* testsuite/libgomp.c++/range-for-3.C: New test.
* testsuite/libgomp.c++/range-for-4.C: New test.
* testsuite/libgomp.c++/range-for-5.C: New test.
libgcc, Darwin: Drop the legacy library build for macOS >= 15 [PR116809].
We have been building a legacy libgcc_s.1 DSO to support code that
was built with older compilers.
From macOS 15, the unwinder no longer exports some of the symbols used
in that library which (a) cuases bootstrap fail and (b) means that the
legacy library is no longer useful.
No open branch of GCC emits references to this library - and any already
-built code that depends on the symbols would need rework anyway.
PR target/116809
libgcc/ChangeLog:
* config.host: Build legacy libgcc_s.1 on hosts before macOS 15.
* config/i386/t-darwin: Remove reference to legacy libgcc_s.1
* config/rs6000/t-darwin: Likewise.
* config/t-darwin-libgccs1: New file.
c++/contracts: ICE in build_contract_condition_function [PR116490]
We currently do not expect comdat group of the guarded function to
be set at the time of generating pre and post check function.
However, in the case of an explicit instantiation, the guarded
function has been added to a comdat group before generating contract
check functions, which causes the observed ICE. Current assert
removed and an additional check for comdat group of the guarded
function added. With this change, the pre and post check functions
get added to the same comdat group of the guarded function if the
guarded function is already placed in a comdat group.
PR c++/116490
gcc/cp/ChangeLog:
* contracts.cc (build_contract_condition_function): added
a check for comdat group of the guarded function. If set,
the condition check function is added to the same comdat
group.
libgomp: with USM, init 'link' variables with host address
If requires unified_shared_memory or self_maps is set, make
'declare target link' variables to point initially to the host pointer.
libgomp/ChangeLog:
* target.c (gomp_load_image_to_device): For requires
unified_shared_memory, update 'link' vars to point to the host var.
* testsuite/libgomp.c-c++-common/target-link-3.c: New test.
* testsuite/libgomp.c-c++-common/target-link-4.c: New test.
OpenMP: Check additional restrictions on context selector properties
TR13 (pre-6.0) of the OpenMP spec says:
"Each trait-property may only be specified once in a trait selector
other than those in the construct selector set."
and
"If trait-property any is specified in the kind trait-selector of the
device selector set or the target_device selector sets, no other
trait-property may be specified in the same selector set."
These restrictions (with slightly different wording) date back to
OpenMP 5.1, but were not in 5.0 which was the basis for GCC's
implementation.
This patch adds a diagnostic, adds new testcases, and fixes some older
testcases that include now-invalid selectors.
gcc/ChangeLog
* omp-general.cc (omp_check_context_selector): Reject other
properties in the same selector set with kind(any). Also reject
duplicate name-list properties.
We allowed the operand convert when matching SAT_SUB in match.pd, to support
the zip benchmark SAT_SUB pattern. Aka,
(convert? (minus (convert1? @0) (convert1? @1))) for below sample code.
void test (uint16_t *x, unsigned b, unsigned n)
{
unsigned a = 0;
register uint16_t *p = x;
do {
a = *--p;
*p = (uint16_t)(a >= b ? a - b : 0); // Truncate after .SAT_SUB
} while (--n);
}
The pattern match for SAT_SUB itself may also act on below scalar sample
code too.
unsigned long long GetTimeFromFrames(int);
unsigned long long GetMicroSeconds();
void DequeueEvent(unsigned frame) {
long long frame_time = GetTimeFromFrames(frame);
unsigned long long current_time = GetMicroSeconds();
DequeueEvent(frame_time < current_time ? 0 : frame_time - current_time);
}
Aka:
uint32_t a = (uint32_t)SAT_SUB(uint64_t, uint64_t);
Then there will be a problem when ia32 or -m32 is given when compiling.
Because we only check the lhs (aka uint32_t) type is supported by ifn
instead of the operand (aka uint64_t). Mostly DImode is disabled for
32 bits target like ia32 or rv32gcv, and then trigger ICE when expanding.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
PR middle-end/116814
gcc/ChangeLog:
* tree-ssa-math-opts.cc (build_saturation_binary_arith_call): Make
ifn is_supported type check based on operand instead of lhs.
Richard Biener [Tue, 24 Sep 2024 11:47:04 +0000 (13:47 +0200)]
tree-optimization/116819 - SLP with !STMT_VINFO_RELEVANT representative
Under some circumstances we can end up picking a not relevant stmt
as representative of a SLP node. Instead of skipping stmt analysis
and declaring success we have to either ignore relevancy throughout
the code base or fail SLP operation verification. The following
does the latter.
PR tree-optimization/116819
* tree-vect-stmts.cc (vect_analyze_stmt): When the SLP
representative isn't relevant signal failure instead of
success.
Robin Dapp [Tue, 3 Sep 2024 15:53:34 +0000 (17:53 +0200)]
RISC-V: Add more vector-vector extract cases.
This adds a V16SI -> V4SI and related i.e. "quartering" vector-vector
extract expander for VLS modes. It helps with spills in x264 that may
cause a load-hit-store.
Robin Dapp [Fri, 30 Aug 2024 12:35:08 +0000 (14:35 +0200)]
RISC-V: Fix effective target check.
The return value is inverted in check_effective_target_rvv_zvl256b_ok
and check_effective_target_rvv_zvl512b_ok. Fix this and also just use
the current march.
Jason Merrill [Thu, 19 Sep 2024 19:50:19 +0000 (15:50 -0400)]
build: enable C++11 narrowing warnings
We've been using -Wno-narrowing since gcc 4.7, but at this point narrowing
diagnostics seem like a stable part of C++ and we should adjust.
This patch changes -Wno-narrowing to -Wno-error=narrowing so that narrowing
issues will still not break bootstrap, but we can see them.
The rest of the patch fixes the narrowing warnings I see in an
x86_64-pc-linux-gnu bootstrap. In most of the cases, by adjusting the types
of various declarations so that we store the values in the same types we
compute them in, which seems worthwhile anyway. This also allowed us to
remove a few -Wsign-compare casts.
gcc/ChangeLog:
* configure.ac (CXX_WARNING_OPTS): Change -Wno-narrowing
to -Wno-error=narrowing.
* configure: Regenerate.
* config/i386/i386.h (debugger_register_map)
(debugger64_register_map)
(svr4_debugger_register_map): Make unsigned.
* config/i386/i386.cc: Likewise.
* diagnostic-event-id.h (diagnostic_thread_id_t): Make int.
* vec.h (vec::size): Make unsigned int.
* ipa-modref.cc (escape_point::arg): Make unsigned.
(modref_lattice::add_escape_point): Use eaf_flags_t.
(update_escape_summary_1): Use eaf_flags_t, && for bool.
* pair-fusion.cc (pair_fusion_bb_info::track_access):
Make mem_size unsigned int.
* pretty-print.cc (format_phase_2): Cast va_arg to char.
* tree-ssa-loop-ch.cc (ch_base::copy_headers): Make nheaders
unsigned, remove cast.
* tree-ssa-structalias.cc (bitpos_of_field): Return unsigned.
(push_fields_onto_fieldstack):Make offset unsigned, remove cast.
* tree-vect-slp.cc (vect_prologue_cost_for_slp): Use nelt_limit.
* tree-vect-stmts.cc (vect_truncate_gather_scatter_offset):
Make scale unsigned.
(vectorizable_operation): Make ncopies unsigned.
* rtl-ssa/member-fns.inl: Make num_accesses unsigned int.
Fortran: Assign allocated caf-memory to scalar members [PR84870]
Allocating a coarray required an array-descriptor. For scalars a
temporary descriptor was created. Assigning the allocated memory from
the temporary descriptor back to the scalar is now added.
gcc/fortran/ChangeLog:
PR fortran/84870
* trans-array.cc (duplicate_allocatable_coarray): For scalar
allocatable components the memory allocated is now assigned to
the component's pointer.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/alloc_comp_10.f90: New test.
Richard Biener [Tue, 24 Sep 2024 10:53:11 +0000 (12:53 +0200)]
tree-optimization/114855 - more update_ssa speedup
The following tackles another source of slow bitmap operations,
namely populating blocks_to_update. We already have that in
tree view around PHI insertion but also the initial population is
slow. There's unfortunately a conditional inbetween list view
requirement and the bitmap API doesn't allow opportunistic
switching but rejects tree -> tree or list -> list transitions.
So the following patch wraps the early population in a tree view
section with possibly one redundant tree -> list -> tree view
transition.
This cuts tree SSA incremental from 228.25s (21%) to 65.05s (7%).
PR tree-optimization/114855
* tree-into-ssa.cc (update_ssa): Use tree view for the
initial population of blocks_to_update.
OpenMP: Add support for 'self_maps' to the 'require' directive
'self_maps' implies 'unified_shared_memory', except that the latter
also permits that explicit maps copy data to device memory while
self_maps does not. In GCC, currently, both are handled identical.
Richard Biener [Fri, 20 Sep 2024 13:07:24 +0000 (15:07 +0200)]
tree-optimization/115372 - failed store-lanes in some cases
The gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c testcase shows
that we sometimes fail to use store-lanes even though it should be
profitable. We're currently relying on vect_slp_prefer_store_lanes_p
at the point we run into the first SLP discovery mismatch with obviously
limited information. For the case at hand we have 3, 5 or 7 lanes
of VnDImode [2, 2] vectors with the first mismatch at lane 2 so the
new group size is 1. The heuristic says that might be an OK split
given the rest is a multiple of the vector lanes. Now we continue
discovery but in the end mismatches result in uniformly single-lane
SLP instances which we can handle via interleaving but of course are
prime candidates for store-lanes. The following patch re-assesses
with the extra knowledge now just relying on the fact whether the
target supports store-lanes for the given group size.
PR tree-optimization/115372
* tree-vect-slp.cc (vect_build_slp_instance): Compute the
uniform, if, number of lanes of the RHS sub-graphs feeding
the store and if uniformly one, use store-lanes if the target
supports that.
libstdc++: Remove unnecessary 'static' from __is_specialization_of
This makes the declarations internal linkage, which is an ODR issue, and
causes a future modules patch to fail regtest as it now detects attempted
uses of TU-local entities in module CMIs.
Richard Biener [Mon, 23 Sep 2024 13:41:14 +0000 (15:41 +0200)]
tree-optimization/114855 - high update_ssa time
Part of the problem in PR114855 is high update_ssa time. When one fixes
the backward jump threading issue tree SSA incremental is at
439.91s ( 26%), mostly doing bitmap element searches for
blocks_with_phis_to_rewrite. The following turns that bitmap to tree
view noticing the two-dimensional vector of PHIs it guards is excessive
compared to what we actually save with it - walking all PHI nodes
in a block, something we already do once to initialize stmt flags.
So instead of optimizing that walk we use the stmt flag, saving
allocations and global state that lives throughout the whole
compilation.
This reduces the tree SSA incremental time to 203.13 ( 14%)
The array was added in r0-74758-g2ce798794df8e1 when we still possibly
had gazillion virtual operands for PR26830, I checked the testcase
still behaves OK.
PR tree-optimization/114855
* tree-into-ssa.cc (phis_to_rewrite): Remove global var.
(mark_phi_for_rewrite): Simplify.
(rewrite_update_phi_arguments): Walk all PHIs, process
those satisfying rewrite_uses_p.
(delete_update_ssa): Simplify.
(update_ssa): Likewise. Switch blocks_with_phis_to_rewrite
to tree view.
__attribute__((noipa)) int foo () { return 42; }
int bar () __attribute__((alias ("foo")));
int baz () __attribute__((alias ("bar")));
int main ()
{
int n;
#pragma omp target map(from:n)
n = baz ();
return n;
}
gcc emits following ptx for baz:
.visible .func (.param.u32 %value_out) bar;
.alias bar,foo;
.visible .func (.param.u32 %value_out) baz;
.alias baz,bar;
which is incorrect since PTX requires aliasee to be a defined function.
The patch instead uses cgraph_node::get(name)->ultimate_alias_target,
which generates the following PTX:
Gaius Mulley [Mon, 23 Sep 2024 23:28:19 +0000 (00:28 +0100)]
modula2: Add noreturn attribute to m2/gm2-libs/M2RTS.mod
This patch removes a build warning by adding a noreturn attribute
to the M2RTS.mod:HaltC procedure. Also add an infinite loop to
gm2-libs-min/M2RTS.mod.
Marek Polacek [Mon, 23 Sep 2024 16:19:40 +0000 (12:19 -0400)]
c++: diagnose this specifier in requires expr [PR116798]
We don't detect an explicit object parameter in a requires expression.
We can get there by way of requires-expression -> requirement-parameter-list
-> parameter-declaration-clause -> ... -> parameter-declaration with
this[opt]. But [dcl.fct]/5 doesn't allow an explicit object parameter
in this context. So let's fix it like r14-9033 and not like r14-8832.
PR c++/116798
gcc/cp/ChangeLog:
* parser.cc (cp_parser_parameter_declaration): Detect an explicit
object parameter in a requires expression.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/explicit-obj-diagnostics12.C: New test.
Saurabh Jha [Wed, 7 Aug 2024 11:34:20 +0000 (12:34 +0100)]
aarch64: Add codegen support for AdvSIMD faminmax
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
mandatory from Armv9.5-a. It introduces instructions for computing the
floating point absolute maximum and minimum of the two vectors
element-wise.
This patch adds code generation support for famax and famin in terms of
existing RTL operators.
famax/famin is equivalent to first taking abs of the operands and then
taking smax/smin on the results of abs.
famax/famin (a, b) = smax/smin (abs (a), abs (b))
This fusion of operators is only possible when -march=armv9-a+faminmax
flags are passed. We also need to pass -ffast-math flag; if we don't,
then a statement like
c[i] = __builtin_fmaxf16 (a[i], b[i]);
is RTL expanded to UNSPEC_FMAXNM instead of smax (likewise for smin).
This code generation is only available on -O2 or -O3 as that is when
auto-vectorization is enabled.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(*aarch64_faminmax_fused): Instruction pattern for faminmax
codegen.
* config/aarch64/iterators.md: Attribute for faminmax codegen.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test.
* gcc.target/aarch64/simd/faminmax-codegen.c: New test.
* gcc.target/aarch64/simd/faminmax-no-codegen.c: New test.
Saurabh Jha [Tue, 6 Aug 2024 15:34:49 +0000 (16:34 +0100)]
aarch64: Add AdvSIMD faminmax intrinsics
The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
mandatory from Armv9.5-a. It introduces instructions for computing the
floating point absolute maximum and minimum of the two vectors element-wise.
This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of
this extension are implemented as the following builtin functions:
* vamax_f16
* vamaxq_f16
* vamax_f32
* vamaxq_f32
* vamaxq_f64
* vamin_f16
* vaminq_f16
* vamin_f32
* vaminq_f32
* vaminq_f64
We are defining a new way to add AArch64 AdvSIMD intrinsics by listing
all the intrinsics in a .def file and then using that .def file to
initialise various data structures. This would lead to more concise code
and easier addition of the new AdvSIMD intrinsics in future.
The faminmax intrinsics are defined using the new approach.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc
(ENTRY): Macro to parse the contents of
aarch64-simd-pragma-builtins.def.
(ENTRY_VHSDF): Macro to parse the contents of
aarch64-simd-pragma-builtins.def.
(enum aarch64_builtins): New enum values for faminmax builtins
via aarch64-simd-pragma-builtins.def.
(enum class aarch64_builtin_signatures): Enum class to specify
the number of operands a builtin will take.
(struct aarch64_pragma_builtins_data): Struct to hold data from
aarch64-simd-pragma-builtins.def.
(aarch64_fntype): New function to define function types of
intrinsics given an object of type aarch64_pragma_builtins_data.
(aarch64_init_pragma_builtins): New function to define pragma
builtins.
(aarch64_get_pragma_builtin): New function to get a row of
aarch64_pragma_builtins, given code.
(handle_arm_neon_h): Modify to call
aarch64_init_pragma_builtins.
(aarch64_general_check_builtin_call): Modify to check whether
required flag is being used for pragma builtins.
(aarch64_expand_pragma_builtin): New function to emit
instructions of pragma_builtin.
(aarch64_general_expand_builtin): Modify to call
aarch64_expand_pragma_builtin.
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Introduce new flag for this extension.
* config/aarch64/aarch64-simd.md
(@aarch64_<faminmax_uns_op><mode>): Instruction pattern for
faminmax intrinsics.
* config/aarch64/aarch64.h
(TARGET_FAMINMAX): Introduce new flag for this extension.
* config/aarch64/iterators.md: New iterators and unspecs.
* doc/invoke.texi: Document extension in AArch64 Options.
* config/aarch64/aarch64-simd-pragma-builtins.def: New file to
list pragma builtins.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test.
* gcc.target/aarch64/simd/faminmax-builtins.c: New test.
Jason Merrill [Mon, 9 Sep 2024 15:20:02 +0000 (11:20 -0400)]
libstdc++: operator new/delete are transaction_safe
With the changes to #pragma system_header, g++.dg/tm/pr46270.C was
failing because <new> didn't implement the N4514 change to [new.delete] that
says "The library versions of the global allocation and deallocation
functions are declared transaction_safe (8.3.5 dcl.fct)." We already have
the _GLIBCXX_TXN_SAFE macro, just need to add it.
Matthieu Longo [Mon, 23 Sep 2024 14:35:07 +0000 (15:35 +0100)]
dwarf2: store the RA state in CFI row
On AArch64, the RA state informs the unwinder whether the return address
is mangled and how, or not. This information is encoded in a boolean in
the CFI row. This binary approach prevents from expressing more complex
configuration, as it is the case with PAuth_LR introduced in Armv9.5-A.
This patch addresses this limitation by replacing the boolean by an enum.
gcc/ChangeLog:
* dwarf2cfi.cc
(struct dw_cfi_row): Declare a new enum type to replace ra_mangled.
(cfi_row_equal_p): Use ra_state instead of ra_mangled.
(dwarf2out_frame_debug_cfa_negate_ra_state): Same.
(change_cfi_row): Same.
Matthieu Longo [Mon, 23 Sep 2024 14:34:57 +0000 (15:34 +0100)]
dwarf2: add hooks for architecture-specific CFIs
Architecture-specific CFI directives are currently declared an processed
among others architecture-independent CFI directives in gcc/dwarf2* files.
This approach creates confusion, specifically in the case of DWARF
instructions in the vendor space and using the same instruction code.
Such a clash currently happen between DW_CFA_GNU_window_save (used on
SPARC) and DW_CFA_AARCH64_negate_ra_state (used on AArch64), and both
having the same instruction code 0x2d.
Then AArch64 compilers generates a SPARC CFI directive (.cfi_window_save)
instead of .cfi_negate_ra_state, contrarilly to what is expected in
[DWARF for the Arm 64-bit Architecture (AArch64)](https://github.com/
ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst).
This refactoring does not solve completely the problem, but improve the
situation by moving some of the processing of those directives (more
specifically their output in the assembly) to the backend via 2 target
hooks:
- DW_CFI_OPRND1_DESC: parse the first operand of the directive (if any).
- OUTPUT_CFI_DIRECTIVE: output the CFI directive as a string.
Additionally, this patch also contains a renaming of an enum used for
return address mangling on AArch64.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_output_cfi_directive): New hook for CFI directives.
(aarch64_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* config/sparc/sparc.cc
(sparc_output_cfi_directive): New hook for CFI directives.
(sparc_dw_cfi_oprnd1_desc): Same.
(TARGET_OUTPUT_CFI_DIRECTIVE): Hook for output_cfi_directive.
(TARGET_DW_CFI_OPRND1_DESC): Hook for dw_cfi_oprnd1_desc.
* coretypes.h
(struct dw_cfi_node): Forward declaration of CFI type from
gcc/dwarf2out.h.
(enum dw_cfi_oprnd_type): Same.
(enum dwarf_call_frame_info): Same.
* doc/tm.texi: Regenerated from doc/tm.texi.in.
* doc/tm.texi.in: Add doc for new target hooks.
type of enum to allow forward declaration.
* dwarf2cfi.cc
(struct dw_cfi_row): Update the description for window_save
and ra_mangled.
(dwarf2out_frame_debug_cfa_negate_ra_state): Use AArch64 CFI
directive instead of the SPARC one.
(change_cfi_row): Use the right CFI directive's name for RA
mangling.
(output_cfi): Remove explicit architecture-specific CFI
directive DW_CFA_GNU_window_save that falls into default case.
(output_cfi_directive): Use target hook as default.
* dwarf2out.cc (dw_cfi_oprnd1_desc): Use target hook as default.
* dwarf2out.h (enum dw_cfi_oprnd_type): specify underlying type
of enum to allow forward declaration.
(dw_cfi_oprnd1_desc): Call target hook.
(output_cfi_directive): Use dw_cfi_ref instead of struct
dw_cfi_node *.
* hooks.cc
(hook_bool_dwcfi_dwcfioprndtyperef_false): New.
(hook_bool_FILEptr_dwcfiptr_false): New.
* hooks.h
(hook_bool_dwcfi_dwcfioprndtyperef_false): New.
(hook_bool_FILEptr_dwcfiptr_false): New.
* target.def: Documentation for new hooks.
Matthieu Longo [Mon, 23 Sep 2024 14:31:18 +0000 (15:31 +0100)]
Rename REG_CFA_TOGGLE_RA_MANGLE to REG_CFA_NEGATE_RA_STATE
The current name REG_CFA_TOGGLE_RA_MANGLE is not representative of what
it really is, i.e. a register to represent several states, not only a
binary one. Same for dwarf2out_frame_debug_cfa_toggle_ra_mangle.
Matthieu Longo [Mon, 23 Sep 2024 14:03:37 +0000 (15:03 +0100)]
libgcc: hide CIE and FDE data for DWARF architecture extensions behind a handler.
This patch provides a new handler MD_ARCH_FRAME_STATE_T to hide an
architecture-specific structure containing CIE and FDE data related
to DWARF architecture extensions.
Hiding the architecture-specific attributes behind a handler has the
following benefits:
1. isolating those data from the generic ones in _Unwind_FrameState
2. avoiding casts to custom types.
3. preserving typing information when debugging with GDB, and so
facilitating their printing.
This approach required to add a new header md-unwind-def.h included at
the top of libgcc/unwind-dw2.h, and redirecting to the corresponding
architecture header via a symbolic link.
An obvious drawback is the increase in complexity with macros, and
headers. It also caused a split of architecture definitions between
md-unwind-def.h (types definitions used in unwind-dw2.h) and
md-unwind.h (local types definitions and handlers implementations).
The naming of md-unwind.h with .h extension is a bit misleading as
the file is only included in the middle of unwind-dw2.c. Changing
this naming would require modification of others backends, which I
prefered to abstain from. Overall the benefits are worth the added
complexity from my perspective.
libgcc/ChangeLog:
* Makefile.in: New target for symbolic link to md-unwind-def.h
* config.host: New parameter md_unwind_def_header. Set it to
aarch64/aarch64-unwind-def.h for AArch64 targets, or no-unwind.h
by default.
* config/aarch64/aarch64-unwind.h
(aarch64_pointer_auth_key): Move to aarch64-unwind-def.h
(aarch64_cie_aug_handler): Update.
(aarch64_arch_extension_frame_init): Update.
(aarch64_demangle_return_addr): Update.
* configure.ac: New substitute variable md_unwind_def_header.
* unwind-dw2.h (defined): MD_ARCH_FRAME_STATE_T.
* config/aarch64/aarch64-unwind-def.h: New file.
* configure: Regenerate.
* config/no-unwind.h: Updated comment
Matthieu Longo [Mon, 23 Sep 2024 14:03:35 +0000 (15:03 +0100)]
aarch64: skip copy of RA state register into target context
The RA state register is local to a frame, so it should not be copied to
the target frame during the context installation.
This patch adds a new backend handler that check whether a register
needs to be skipped or not before its installation.
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h
(MD_FRAME_LOCAL_REGISTER_P): new handler checking whether a register
from the current context needs to be skipped before installation into
the target context.
(aarch64_frame_local_register): Likewise.
* unwind-dw2.c (uw_install_context_1): use MD_FRAME_LOCAL_REGISTER_P.
Matthieu Longo [Mon, 23 Sep 2024 14:03:30 +0000 (15:03 +0100)]
aarch64: store signing key and signing method in DWARF _Unwind_FrameState
This patch is only a refactoring of the existing implementation
of PAuth and returned-address signing. The existing behavior is
preserved.
_Unwind_FrameState already contains several CIE and FDE information
(see the attributes below the comment "The information we care
about from the CIE/FDE" in libgcc/unwind-dw2.h).
The patch aims at moving the information from DWARF CIE (signing
key stored in the augmentation string) and FDE (the used signing
method) into _Unwind_FrameState along the already-stored CIE and
FDE information.
Note: those information have to be saved in frame_state_reg_info
instead of _Unwind_FrameState as they need to be savable by
DW_CFA_remember_state and restorable by DW_CFA_restore_state, that
both rely on the attribute "prev".
Those new information in _Unwind_FrameState simplifies the look-up
of the signing key when the return address is demangled. It also
allows future signing methods to be easily added.
_Unwind_FrameState is not a part of the public API of libunwind,
so the change is backward compatible.
A new architecture-specific handler MD_ARCH_EXTENSION_FRAME_INIT
allows to reset values (if needed) in the frame state and unwind
context before changing the frame state to the caller context.
A new architecture-specific handler MD_ARCH_EXTENSION_CIE_AUG_HANDLER
isolates the architecture-specific augmentation strings in AArch64
backend, and allows others architectures to reuse augmentation
strings that would have clashed with AArch64 DWARF extensions.
aarch64_demangle_return_addr, DW_CFA_AARCH64_negate_ra_state and
DW_CFA_val_expression cases in libgcc/unwind-dw2-execute_cfa.h
were documented to clarify where the value of the RA state register
is stored (FS and CONTEXT respectively).
libgcc/ChangeLog:
* config/aarch64/aarch64-unwind.h
(AARCH64_DWARF_RA_STATE_MASK): The mask for RA state register.
(aarch64_ra_signing_method_t): The diversifiers used to sign a
function's return address.
(aarch64_pointer_auth_key): The key used to sign a function's
return address.
(aarch64_cie_signed_with_b_key): Deleted as the signing key is
available now in _Unwind_FrameState.
(MD_ARCH_EXTENSION_CIE_AUG_HANDLER): New CIE augmentation string
handler for architecture extensions.
(MD_ARCH_EXTENSION_FRAME_INIT): New architecture-extension
initialization routine for DWARF frame state and context before
execution of DWARF instructions.
(aarch64_context_ra_state_get): Read RA state register from CONTEXT.
(aarch64_ra_state_get): Read RA state register from FS.
(aarch64_ra_state_set): Write RA state register into FS.
(aarch64_ra_state_toggle): Toggle RA state register in FS.
(aarch64_cie_aug_handler): Handler AArch64 augmentation strings.
(aarch64_arch_extension_frame_init): Initialize defaults for the
signing key (PAUTH_KEY_A), and RA state register (RA_no_signing).
(aarch64_demangle_return_addr): Rely on the frame registers and
the signing_key attribute in _Unwind_FrameState.
* unwind-dw2-execute_cfa.h:
Use the right alias DW_CFA_AARCH64_negate_ra_state for __aarch64__
instead of DW_CFA_GNU_window_save.
(DW_CFA_AARCH64_negate_ra_state): Save the signing method in RA
state register. Toggle RA state register without resetting 'how'
to REG_UNSAVED.
* unwind-dw2.c:
(extract_cie_info): Save the signing key in the current
_Unwind_FrameState while parsing the augmentation data.
(uw_frame_state_for): Reset some attributes related to architecture
extensions in _Unwind_FrameState.
(uw_update_context): Move authentication code to AArch64 unwinding.
* unwind-dw2.h (enum register_rule): Give a name to the existing
enum for the register rules, and replace 'unsigned char' by 'enum
register_rule' to facilitate debugging in GDB.
(_Unwind_FrameState): Add a new architecture-extension attribute
to store the signing key.
OpenMP: Fix omp_get_device_from_uid, minor cleanup
In Fortran, omp_get_device_from_uid can also accept substrings, which are
then not NUL terminated. Fixed by introducing a fortran.c wrapper function.
Additionally, in case of a fail the plugin functions now return NULL instead
of failing fatally such that a fall-back UID is generated.
gcc/ChangeLog:
* omp-general.cc (omp_runtime_api_procname): Strip "omp_" from
string; move get_device_from_uid as now a '_' suffix exists.
libgomp/ChangeLog:
* fortran.c (omp_get_device_from_uid_): New function.
* libgomp.map (GOMP_6.0): Add it.
* oacc-host.c (host_dispatch): Init '.uid' and '.get_uid_func'.
* omp_lib.f90.in: Make it used by removing bind(C).
* omp_lib.h.in: Likewise.
* target.c (omp_get_device_from_uid): Ensure the device is initialized.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): Add function comment;
return NULL in case of an error.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): Likewise.
* testsuite/libgomp.fortran/device_uid.f90: Update to test substrings.
The target dependent mlra option was designed to be able to quickly
switch between LRA and reload. The reload register allocator step is
scheduled for retirement, thus, remove the functionality of mlra,
keeping it for backward compatibility.
PR target/113954
gcc/ChangeLog:
* config/arc/arc.cc (TARGET_LRA_P): Always return true.
(arc_lra_p): Remove.
* config/arc/arc.h (TARGET_LRA): Remove.
* config/arc/arc.opt (mlra): Change it to do nothing.
* doc/invoke.texi (mlra): Update option description.
Simon Martin [Mon, 16 Sep 2024 11:45:32 +0000 (13:45 +0200)]
c++: Don't crash when mangling member with anonymous union or template type [PR100632, PR109790]
We currently crash upon mangling members that have an anonymous union or
a template operator type.
The problem is that before calling write_unqualified_name,
write_member_name asserts that it has a declaration whose DECL_NAME is
an identifier node that is not that of an operator. This is wrong:
- In PR100632, it's an anonymous union declaration, hence a 0 DECL_NAME
- In PR109790, it's a legitimate template declaration for an operator
(this was accepted up to GCC 10)
This assert was added via r11-6301, to be sure that we do write the "on"
marker for operator members.
This patch removes that assert and instead
- Lets members with an anonymous union type go through
- For operators, adds the missing "on" marker for ABI versions greater
than the highest usable with GCC 10
PR c++/109790
PR c++/100632
gcc/cp/ChangeLog:
* mangle.cc (write_member_name): Handle members whose type is an
anonymous union member. Write missing "on" marker for operators
when ABI version is at least 16.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype83.C: New test.
* g++.dg/cpp0x/decltype83a.C: New test.
* g++.dg/cpp1y/lambda-ice3.C: New test.
* g++.dg/cpp1y/lambda-ice3a.C: New test.
* g++.dg/cpp2a/nontype-class67.C: New test.
Simon Martin [Wed, 18 Sep 2024 10:35:27 +0000 (12:35 +0200)]
c++: Don't ICE due to artificial constructor parameters [PR116722]
The following code triggers an ICE
=== cut here ===
class base {};
class derived : virtual public base {
public:
template<typename Arg> constexpr derived(Arg) {}
};
int main() {
derived obj(1.);
}
=== cut here ===
The problem is that cxx_bind_parameters_in_call ends up attempting to
convert a REAL_CST (the first non artificial parameter) to INTEGER_TYPE
(the type of the __in_chrg parameter), which ICEs.
This patch changes cxx_bind_parameters_in_call to return early if it's
called with a *structor that has an __in_chrg or __vtt_parm parameter
since the expression won't be a constant expression.
Note that in the test case, the constructor is not constexpr-suitable,
however it's OK since it's a template according to my read of paragraph
(3) of [dcl.constexpr].
PR c++/116722
gcc/cp/ChangeLog:
* constexpr.cc (cxx_bind_parameters_in_call): Leave early for
{con,de}structors of classes with virtual bases.
Richard Biener [Mon, 23 Sep 2024 08:30:32 +0000 (10:30 +0200)]
tree-optimization/116810 - out-of-bound access to matches[]
The following makes sure to apply forced splitting of groups for
firced single-lane SLP only when the group being analyzed has more
than one lane. This avoids an out-of-bound access to matches[].
PR tree-optimization/116810
* tree-vect-slp.cc (vect_build_slp_instance): Onlu force
splitting for group_size > 1.
Richard Biener [Mon, 23 Sep 2024 09:05:37 +0000 (11:05 +0200)]
tree-optimization/116796 - virtual LC SSA broken after unrolling
When the unroller unloops loops it tracks whether it changes any
nesting relationship of remaining loops but when scanning a loops
preheader it fails to pass down the LC-SSA-invalidated bitmap, losing
the fact that an unrolled formerly inner loop can now be placed on
an exit of its outer loop. The following fixes that.
PR tree-optimization/116796
* cfgloopmanip.cc (fix_loop_placements): Get LC-SSA-invalidated
bitmap and pass it on.
(remove_path): Pass LC-SSA-invalidated to fix_loop_placements.
Tamar Christina [Mon, 23 Sep 2024 10:45:43 +0000 (11:45 +0100)]
middle-end: Insert invariant instructions before the gsi [PR116812]
The new invariant statements should be inserted before the current
statement and not after. This goes fine 99% of the time but when the
current statement is a gcond the control flow gets corrupted.
The following restricts the elementwise SLP vectorization to the
single-lane case which is the reason I enabled it to avoid regressions
with non-SLP. The PR shows that multi-line SLP loads with elementwise
accesses require work, I'll open a new bug to track this for the
future.
PR tree-optimization/116791
* tree-vect-stmts.cc (get_group_load_store_type): Only
fall back to elementwise access for single-lane SLP, restore
hard failure mode for other cases.
gcn/mkoffload.cc: Re-add fprintf for #include of stdlib.h/stdbool.h
In commit r15-3629-g508ef585243d4674d06b0737bfe8769fc18f824f, #embed
was added and no longer required fprintf '#include' removed, missing
somehow that with -mstack-size=, the generated configure_stack_size
will use 'setenv' and 'true'.
gcc/ChangeLog:
* config/gcn/mkoffload.cc (process_asm): (Re)add the fprintf
lines for stdlib.h/stdbool.h inclusion if gcn_stack_size is used.
Pan Li [Sat, 21 Sep 2024 14:30:18 +0000 (22:30 +0800)]
Genmatch: Fix ICE for binary phi cfg mismatching [PR116795]
This patch would like to fix one ICE when try to match the binary
phi for below cfg. We check the first edge of the Phi block comes
from b0, instead of check the only one edge of b1 comes from the
b0 too. Thus, it will result in some code to be recog as .SAT_SUB
but it is not, and finally result the verify_ssa failure.
Andrew Pinski [Sun, 22 Sep 2024 20:18:30 +0000 (13:18 -0700)]
gimple: Simplify gimple_seq_nondebug_singleton_p
The implementation of gimple_seq_nondebug_singleton_p
was convoluted on how to determine if the sequence
was a singleton (which could contain debug statements).
This simplifies the function into two calls. One to get the start
after all of the debug statements and then check to see if it
is at the one before the end (or there is only debug statements
afterwards).
Bootstrapped and tested on x86_64-linux-gnu (including ada).
gcc/ChangeLog:
* gimple-iterator.h (gimple_seq_nondebug_singleton_p):
Rewrite to be simplely, gsi_start_nondebug/gsi_one_nondebug_before_end_p.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-7.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-8.c: New test.
testsuite/gfortran.dg/unsigned_22.f90: Add missing close with delete, PR116701
Without this patch, gfortran.dg/unsigned_22.f90 fails for
non-effective-target fd_truncate targets, i.e. targets that
don't support chsize or ftruncate. See also
libgfortran/io/unix.c:raw_truncate. It passes on the first
run, but leaves behind a file "fort.10" which is then picked
up by subsequent runs, but since that file is to be
rewritten, the libgfortran machinery tries to truncate it,
which fails. The file always being left behind, is
primarily because the test-case lacks a deleting
close-statement, apparently accidentally.
Incidentally, this "fort.10" artefact is also picked up by
gfortran.dg/write_check3.f90 causing that test to fail too,
observable as a regression for non-fd_truncate targets since
the unsigned_22.f90 introduction. Also, when running
e.g. the whole of gfortran.dg/dg.exp, the "fort.10" is later
deleted by gfortran.dg/write_direct_eor.f90 (which
regardlessly passes), erasing the clue of the cause of the
write_check3 failure. Also, running just
dg.exp=write_check3.f90 or manually repeating the commands
in gfortran.log showed no error.
N.B.: this close-statement will not help if unsigned_22 for
some reason fails, executing one of the "stop" statements,
but that's also the case for many other tests.
PR testsuite/116701
* gfortran.dg/unsigned_22.f90: Add missing close with delete.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-13.c: New test.
* gcc.target/riscv/sat_s_add-14.c: New test.
* gcc.target/riscv/sat_s_add-15.c: New test.
* gcc.target/riscv/sat_s_add-16.c: New test.
* gcc.target/riscv/sat_s_add-run-13.c: New test.
* gcc.target/riscv/sat_s_add-run-14.c: New test.
* gcc.target/riscv/sat_s_add-run-15.c: New test.
* gcc.target/riscv/sat_s_add-run-16.c: New test.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_add-10.c: New test.
* gcc.target/riscv/sat_s_add-11.c: New test.
* gcc.target/riscv/sat_s_add-12.c: New test.
* gcc.target/riscv/sat_s_add-9.c: New test.
* gcc.target/riscv/sat_s_add-run-10.c: New test.
* gcc.target/riscv/sat_s_add-run-11.c: New test.
* gcc.target/riscv/sat_s_add-run-12.c: New test.
* gcc.target/riscv/sat_s_add-run-9.c: New test.
testsuite, coroutines: Add tests for non-supension ramp returns.
Although it is most common for the ramp function to see a return when a coroutine
first suspends, there are other possibilities. For example all the awaits could
be ready - effectively the coroutine will then run to completion and deallocation.
Another case is where the first active suspension point causes the current routine
to be cancelled and thence destroyed.
These cases are tested here.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/torture/special-termination-00-sync-completion.C: New test.
* g++.dg/coroutines/torture/special-termination-01-self-destruct.C: New test.
libgcc, Darwin: From macOS 11, make that the earliest supported.
For libgcc, we have (so far) supported building a DSO that supports
earlier versions of the OS than the target. From macOS 11, there are
APIs that do not exist on earlier OS versions, so limit the libgcc
range to macOS11..current.
libgcc/ChangeLog:
* config.host: From macOS 11, limit earliest macOS support
to macOS 11.
* config/t-darwin-min-11: New file.
I noticed that char8_t was missing from the list of types that were
prevented from using the std::formatter partial specialization for
integer types. That partial specialization was also matching
cv-qualified integer types, because std::integral<const int> is true.
This change simplifies the constraints by introducing a new variable
template which is only true for cv-unqualified integer types, with
explicit specializations to exclude the character types. This should be
slightly more efficient than the previous constraints that checked
std::integral<T> and (!__is_one_of<T, char, wchar_t, ...>). It also
avoids the need for a separate std::formatter specialization for 128-bit
integers, as they can be handled by the new variable template too.
libstdc++-v3/ChangeLog:
* include/std/format (__format::__is_formattable_integer): New
variable template and specializations.
(template<integral, __char> struct formatter): Replace
constraints on first arg with __is_formattable_integer.
* testsuite/std/format/formatter/requirements.cc: Check that
std::formatter specializations for char8_t and const int are
disabled.
Jonathan Wakely [Wed, 18 Sep 2024 16:20:29 +0000 (17:20 +0100)]
libstdc++: Fix formatting of most negative chrono::duration [PR116755]
When formatting chrono::duration<signed-integer-type, P>::min() we were
causing undefined behaviour by trying to form the negative of the most
negative value. If we convert negative durations with integer rep to the
corresponding unsigned integer rep then we can safely represent all
values.
libstdc++-v3/ChangeLog:
PR libstdc++/116755
* include/bits/chrono_io.h (formatter<duration<R,P>>::format):
Cast negative integral durations to unsigned rep.
* testsuite/20_util/duration/io.cc: Test the most negative
integer durations.
Jonathan Wakely [Fri, 13 Sep 2024 09:18:46 +0000 (10:18 +0100)]
libstdc++: add default template parameters to algorithms
This implements P2248R8 + P3217R0, both approved for C++26.
The changes are mostly mechanical; the struggle is to keep readability
with the pre-P2248 signatures.
* For containers, "classic STL" algorithms and their parallel versions,
introduce a macro and amend their declarations/definitions with it.
The macro either expands to the defaulted parameter or to nothing
in pre-C++26 modes.
* For range algorithms, we need to reorder their template parameters.
I've done so unconditionally, because users cannot rely on template
parameters of algorithms (this is explicitly authorized by
[algorithms.requirements]/15). The defaults are then hidden behind
another macro.
libstdc++-v3/ChangeLog:
* include/bits/iterator_concepts.h: Add projected_value_t.
* include/bits/algorithmfwd.h: Add the default template
parameter to the relevant forward declarations.
* include/pstl/glue_algorithm_defs.h: Likewise.
* include/bits/ranges_algo.h: Add the default template
parameter to range-based algorithms.
* include/bits/ranges_algobase.h: Likewise.
* include/bits/ranges_util.h: Likewise.
* include/bits/ranges_base.h: Add helper macros.
* include/bits/stl_iterator_base_types.h: Add helper macro.
* include/bits/version.def: Add the new feature-testing macro.
* include/bits/version.h: Regenerate.
* include/std/algorithm: Pull the feature-testing macro.
* include/std/ranges: Likewise.
* include/std/deque: Pull the feature-testing macro, add
the default for std::erase.
* include/std/forward_list: Likewise.
* include/std/list: Likewise.
* include/std/string: Likewise.
* include/std/vector: Likewise.
* testsuite/23_containers/default_template_value.cc: New test.
* testsuite/25_algorithms/default_template_value.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Tamar Christina [Sun, 22 Sep 2024 12:38:49 +0000 (13:38 +0100)]
middle-end: lower COND_EXPR into gimple form in vect_recog_bool_pattern
Currently the vectorizer cheats when lowering COND_EXPR during bool recog.
In the cases where the conditonal is loop invariant or non-boolean it instead
converts the operation back into GENERIC and hides much of the operation from
the analysis part of the vectorizer.
i.e.
a ? b : c
is transformed into:
a != 0 ? b : c
however by doing so we can't perform any optimization on the mask as they aren't
explicit until quite late during codegen.
To fix this this patch lowers booleans earlier and so ensures that we are always
in GIMPLE.
For when the value is a loop invariant boolean we have to generate an additional
conversion from bool to the integer mask form.
This is done by creating a loop invariant a ? -1 : 0 with the target mask
precision and then doing a normal != 0 comparison on that.
To support this the patch also adds the ability to during pattern matching
create a loop invariant pattern that won't be seen by the vectorizer and will
instead me materialized inside the loop preheader in the case of loops, or in
the case of BB vectorization it materializes it in the first BB in the region.
* gcc.dg/vect/bb-slp-conditional_store_1.c: New test.
* gcc.dg/vect/vect-conditional_store_5.c: New test.
* gcc.dg/vect/vect-conditional_store_6.c: New test.
Tamar Christina [Sun, 22 Sep 2024 12:34:10 +0000 (13:34 +0100)]
aarch64: Take into account when VF is higher than known scalar iters
Consider low overhead loops like:
void
foo (char *restrict a, int *restrict b, int *restrict c, int n)
{
for (int i = 0; i < 9; i++)
{
int res = c[i];
int t = b[i];
if (a[i] != 0)
res = t;
c[i] = res;
}
}
For such loops we use latency only costing since the loop bounds is known and
small.
The current costing however does not consider the case where niters < VF.
So when comparing the scalar vs vector costs it doesn't keep in mind that the
scalar code can't perform VF iterations. This makes it overestimate the cost
for the scalar loop and we incorrectly vectorize.
This patch takes the minimum of the VF and niters in such cases.
Before the patch we generate:
note: Original vector body cost = 46
note: Vector loop iterates at most 1 times
note: Scalar issue estimate:
note: load operations = 2
note: store operations = 1
note: general operations = 1
note: reduction latency = 0
note: estimated min cycles per iteration = 1.000000
note: estimated cycles per vector iteration (for VF 32) = 32.000000
note: SVE issue estimate:
note: load operations = 5
note: store operations = 4
note: general operations = 11
note: predicate operations = 12
note: reduction latency = 0
note: estimated min cycles per iteration without predication = 5.500000
note: estimated min cycles per iteration for predication = 12.000000
note: estimated min cycles per iteration = 12.000000
note: Low iteration count, so using pure latency costs
note: Cost model analysis:
vs after:
note: Original vector body cost = 46
note: Known loop bounds, capping VF to 9 for analysis
note: Vector loop iterates at most 1 times
note: Scalar issue estimate:
note: load operations = 2
note: store operations = 1
note: general operations = 1
note: reduction latency = 0
note: estimated min cycles per iteration = 1.000000
note: estimated cycles per vector iteration (for VF 9) = 9.000000
note: SVE issue estimate:
note: load operations = 5
note: store operations = 4
note: general operations = 11
note: predicate operations = 12
note: reduction latency = 0
note: estimated min cycles per iteration without predication = 5.500000
note: estimated min cycles per iteration for predication = 12.000000
note: estimated min cycles per iteration = 12.000000
note: Increasing body cost to 1472 because the scalar code could issue within the limit imposed by predicate operations
note: Low iteration count, so using pure latency costs
note: Cost model analysis:
gcc/ChangeLog:
* config/aarch64/aarch64.cc (adjust_body_cost):
Cap VF for low iteration loops.
Mikael Morin [Sat, 21 Sep 2024 16:33:11 +0000 (18:33 +0200)]
fortran: Add -finline-intrinsics flag for MINLOC/MAXLOC [PR90608]
Introduce the -finline-intrinsics flag to control from the command line
whether to generate either inline code or calls to the functions from the
library, for the MINLOC and MAXLOC intrinsics.
The flag allows to specify inlining either independently for each intrinsic
(either MINLOC or MAXLOC), or all together. For each intrinsic, a default
value is set if none was set. The default value depends on the optimization
setting: inlining is avoided if not optimizing or if optimizing for size;
otherwise inlining is preferred.
There is no direct support for this behaviour provided by the .opt options
framework. It is obtained by defining three different variants of the flag
(finline-intrinsics, fno-inline-intrinsics, finline-intrinsics=) all using
the same underlying option variable. Each enum value (corresponding to an
intrinsic function) uses two identical bits, and the variable is initialized
with alternated bits, so that we can tell whether the value was set or not
by checking whether the two bits have different values.
PR fortran/90608
gcc/ChangeLog:
* flag-types.h (enum gfc_inlineable_intrinsics): New type.
gcc/fortran/ChangeLog:
* invoke.texi(finline-intrinsics): Document new flag.
* lang.opt (finline-intrinsics, finline-intrinsics=,
fno-inline-intrinsics): New flags.
* options.cc (gfc_post_options): If the option variable controlling
the inlining of MAXLOC (respectively MINLOC) has not been set, set
it or clear it depending on the optimization option variables.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Return false
if inlining for the intrinsic is disabled according to the option
variable.
gcc/testsuite/ChangeLog:
* gfortran.dg/minmaxloc_18.f90: New test.
* gfortran.dg/minmaxloc_18a.f90: New test.
* gfortran.dg/minmaxloc_18b.f90: New test.
* gfortran.dg/minmaxloc_18c.f90: New test.
* gfortran.dg/minmaxloc_18d.f90: New test.
Mikael Morin [Sat, 21 Sep 2024 16:33:04 +0000 (18:33 +0200)]
fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]
Continue the second set of loops where the first one stopped in the
generated inline MINLOC/MAXLOC code in the cases where the generated code
contains two sets of loops. This fixes a regression that was introduced
when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
greater than 1, no DIM argument, and either non-scalar MASK or floating-
point ARRAY.
In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
code, we previously generated code such as (for rank 2 ARRAY, so with two
levels of nesting):
for (idx11 in lower1..upper1)
{
for (idx12 in lower2..upper2)
{
...
if (...)
{
...
goto second_loop;
}
}
}
second_loop:
for (idx21 in lower1..upper1)
{
for (idx22 in lower2..upper2)
{
...
}
}
which means we process the first elements twice, once in the first set
of loops and once in the second one. This change avoids this duplicate
processing by using a conditional as lower bound for the second set of
loops, generating code like:
second_loop_entry = false;
for (idx11 in lower1..upper1)
{
for (idx12 in lower2..upper2)
{
...
if (...)
{
...
second_loop_entry = true;
goto second_loop;
}
}
}
second_loop:
for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
{
for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
{
...
second_loop_entry = false;
}
}
It was expected that the compiler optimizations would be able to remove the
state variable second_loop_entry. It is the case if ARRAY has rank 1 (so
without loop nesting), the variable is removed and the loop bounds become
unconditional, which restores previously generated code, fully fixing the
regression. For larger rank, unfortunately, the state variable and
conditional loop bounds remain, but those cases were previously using
library calls, so it's not a regression.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
of index variables. Set them using the loop indexes before leaving
the first set of loops. Generate a new loop entry predicate.
Initialize it. Set it before leaving the first set of loops. Clear
it in the body of the second set of loops. For the second set of
loops, update each loop lower bound to use the corresponding index
variable if the predicate variable is set.
Mikael Morin [Sat, 21 Sep 2024 16:32:59 +0000 (18:32 +0200)]
fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
Enable generation of inline MINLOC/MAXLOC code in the case where DIM
is not present, and either ARRAY is of floating point type or MASK is an
array. Those cases are the remaining bits to fully support inlining of
non-CHARACTER MINLOC/MAXLOC without DIM. They are treated together because
they generate similar code, the NANs for REAL types being handled a bit like
a second level of masking. These are the cases for which we generate two
sets of loops.
This change affects the code generating the second loop, that was previously
accessible only in the cases ARRAY has rank 1 only. The single variable
initialization and update are changed to apply to multiple variables, one
per dimension.
The code generated is as follows (if ARRAY has rank 2):
for (idx11 in lower1..upper1)
{
for (idx12 in lower2..upper2)
{
...
if (...)
{
...
goto second_loop;
}
}
}
second_loop:
for (idx21 in lower1..upper1)
{
for (idx22 in lower2..upper2)
{
...
}
}
This code leads to processing the first elements redundantly, both in the
first set of loops and in the second one. The loop over idx22 could
start from idx12 the first time it is run, but as it has to start from
lower2 for the rest of the runs, this change uses the same bounds for both
set of loops for simplicity. In the rank 1 case, this makes the generated
code worse compared to the inline code that was generated before. A later
change will introduce conditionals to avoid the duplicate processing and
restore the generated code in that case.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
and update all the variables. Put the label and goto in the
outermost scalarizer loop. Don't start the second loop where the
first stopped.
(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
or for any REAL type.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
messages reported by the scalarizer.
* gfortran.dg/maxloc_bounds_6.f90: Ditto.
Mikael Morin [Sat, 21 Sep 2024 16:32:51 +0000 (18:32 +0200)]
fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]
Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
is of integral type, DIM is not present, and MASK is present and is scalar
(only absent MASK or rank 1 ARRAY were inlined before).
Scalar masks are implemented with a wrapping condition around the code one
would generate if MASK wasn't present, so they are easy to support once
inline code without MASK is working.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
variable initialization for each dimension in the else branch of
the toplevel condition.
(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
reported by the scalarizer.
Mikael Morin [Sat, 21 Sep 2024 16:32:44 +0000 (18:32 +0200)]
fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]
Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
if the ARRAY argument is of integral type and of any rank (only the rank 1
case was previously inlined), and neither DIM nor MASK arguments are
present.
This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
mainly to replace the single variables POS and OFFSET, with collections
of variables, one variable per dimension each.
The restriction to integral ARRAY and absent MASK limits the scope of
the change to the cases where we generate single loop inline code. The
code generation for the second loop is only accessible with ARRAY of rank
1, so it can continue using a single variable. A later change will extend
inlining to the double loop cases.
There is some bounds checking code that was previously handled by the
library, and that needed some changes in the scalarizer to avoid regressing.
The bounds check code generation was already supported by the scalarizer,
but it was only applying to array reference sections, checking both
for array bound violation and for shape conformability between all the
involved arrays. With this change, for MINLOC or MAXLOC, enable the
conformability check between all the scalarized arrays, and disable the
array bound violation check.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
result upper bound using the rank of the ARRAY argument. Ajdust
the error message for intrinsic result arrays. Only check array
bounds for array references. Move bound check decision code...
(bounds_check_needed): ... here as a new predicate. Allow bound
check for MINLOC/MAXLOC intrinsic results.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
result array upper bound to the rank of ARRAY. Update the NONEMPTY
variable to depend on the non-empty extent of every dimension. Use
one variable per dimension instead of a single variable for the
position and the offset. Update their declaration, initialization,
and update to affect the variable of each dimension. Use the first
variable only in areas only accessed with rank 1 ARRAY argument.
Set every element of the result using its corresponding variable.
(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
and absent DIM and MASK.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
message emitted by the scalarizer.
Remove the frontend pass rewriting calls of MINLOC/MAXLOC without DIM to
calls with one-valued DIM enclosed in an array constructor. This
transformation was circumventing the limitation of inline MINLOC/MAXLOC code
generation to scalar cases only, allowing inline code to be generated if
ARRAY had rank 1 and DIM was absent. As MINLOC/MAXLOC has gained support of
inline code generation in that case, the limitation is no longer effective,
and the transformation no longer necessary.
gcc/fortran/ChangeLog:
* frontend-passes.cc (optimize_minmaxloc): Remove.
(optimize_expr): Remove dispatch to optimize_minmaxloc.
Mikael Morin [Sat, 21 Sep 2024 16:32:25 +0000 (18:32 +0200)]
fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]
Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the
DIM argument is not present and ARRAY has rank 1. This case is similar to
the case where the result is scalar (DIM present and rank 1 ARRAY), which
already supports inline expansion of the intrinsic. Both cases return
the same value, with the difference that the result is an array of size 1 if
DIM is absent, whereas it's a scalar if DIM is present. So all there is
to do for the new case to work is hook the inline expansion with the
scalarizer.
PR fortran/90608
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_ss_startstride): Set the scalarization
rank based on the MINLOC/MAXLOC rank if needed. Call the inline
code generation and setup the scalarizer array descriptor info
in the MINLOC and MAXLOC cases.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the
result array element if the scalarizer is setup and we are inside
the loops. Restrict library function call dispatch to the case
where inline expansion is not supported. Declare an array result
if the expression isn't scalar. Initialize the array result single
element and return the result variable if the expression isn't
scalar.
(walk_inline_intrinsic_minmaxloc): New function.
(walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases,
dispatching to walk_inline_intrinsic_minmaxloc.
(gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases.
(gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1,
regardless of DIM.
Mikael Morin [Sat, 21 Sep 2024 16:32:19 +0000 (18:32 +0200)]
fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
Disable rewriting of MINLOC/MAXLOC expressions for which inline code
generation is supported. Update the gfc_inline_intrinsic_function_p
predicate (already existing) for that, with the current state of
MINLOC/MAXLOC inlining support, that is only the cases of a scalar
result and non-CHARACTER argument for now.
This change has no effect currently, as the MINLOC/MAXLOC front-end passes
only change expressions of rank 1, but the inlining control predicate
gfc_inline_intrinsic_function_p returns false for those. However, later
changes will extend MINLOC/MAXLOC inline expansion support to array
expressions and update the inlining control predicate, and this will become
effective.
PR fortran/90608
gcc/fortran/ChangeLog:
* frontend-passes.cc (optimize_minmaxloc): Skip if we can generate
inline code for the unmodified expression.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Add
MINLOC and MAXLOC cases.
Mikael Morin [Sat, 21 Sep 2024 16:32:10 +0000 (18:32 +0200)]
fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
Add the tests covering the various cases for which we are about to implement
inline expansion of MINLOC and MAXLOC. Those are cases where the DIM
argument is not present.
PR fortran/90608
gcc/testsuite/ChangeLog:
* gfortran.dg/ieee/maxloc_nan_1.f90: New test.
* gfortran.dg/ieee/minloc_nan_1.f90: New test.
* gfortran.dg/maxloc_7.f90: New test.
* gfortran.dg/maxloc_with_mask_1.f90: New test.
* gfortran.dg/minloc_8.f90: New test.
* gfortran.dg/minloc_with_mask_1.f90: New test.
Jason Merrill [Mon, 9 Sep 2024 16:35:37 +0000 (12:35 -0400)]
libstdc++: fix C header include guards
Ever since the c_global and c_compatibility directories were added in
r122533, the include guards have been oddly late in the files, with no
comment about why that might be either in the commit message or the files
themselves. I don't see any justification for this; it seems like a
scripting error in creating these files based on the ones in include/c.