Jonathan Wakely [Mon, 26 Feb 2024 11:40:46 +0000 (11:40 +0000)]
libstdc++: Make _GLIBCXX_NODISCARD work for C++11 and C++14
The _GLIBCXX_NODISCARD macro only expands to [[__nodiscard__]] for C++17
and later, but all supported compilers will allow us to use that for
C++11 and C++14 too. Enable it for those older standards, to give
improved diagnostics for users of those older standards.
Jason Merrill [Thu, 14 Nov 2024 04:39:53 +0000 (23:39 -0500)]
libstdc++: stdc++.h and <coroutine>
r13-3036 moved #include <coroutine> into the new freestanding section, but
also moved it from a C++20 section to a C++23 section. This patch moves it
back.
Incidentally, I'm curious why a few headers were removed from the hosted
section (including <coroutine>), but most were left in place, so we have
redundant includes of most hosted headers.
libstdc++-v3/ChangeLog:
* include/precompiled/stdc++.h: <coroutine> is C++20.
Richard Ball [Thu, 14 Nov 2024 16:15:13 +0000 (16:15 +0000)]
aarch64: Add tests and docs for indirect_return attribute
This patch adds a new testcase and docs for indirect_return
attribute.
gcc/ChangeLog:
* doc/extend.texi: Add AArch64 docs for indirect_return
attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/indirect_return-1.c: New test.
* gcc.target/aarch64/indirect_return-2.c: New test.
* gcc.target/aarch64/indirect_return-3.c: New test.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:13 +0000 (16:15 +0000)]
aarch64: Introduce indirect_return attribute
Tail calls of indirect_return functions from non-indirect_return
functions are disallowed even if BTI is disabled, since the call
site may have BTI enabled.
Needed for swapcontext within the same function when GCS is enabled.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_gnu_attributes): Add
indirect_return.
(aarch64_gen_callee_cookie): Use indirect_return attribute.
(aarch64_callee_indirect_return): New.
(aarch_fun_is_indirect_return): New.
(aarch64_function_ok_for_sibcall): Disallow tail calls if caller
is non-indirect_return but callee is indirect_return.
(aarch64_function_arg): Add indirect_return to cookie.
(aarch64_init_cumulative_args): Record indirect_return in
CUMULATIVE_ARGS.
(aarch64_comp_type_attributes): Check indirect_return attribute.
(aarch64_output_mi_thunk): Add indirect_return to cookie.
* config/aarch64/aarch64.h (CUMULATIVE_ARGS): Add new field
indirect_return.
* config/aarch64/aarch64.md (tlsdesc_small_<mode>): Update.
* config/aarch64/aarch64-opts.h (AARCH64_NUM_ABI_ATTRIBUTES): New.
* config/aarch64/aarch64-protos.h (aarch64_gen_callee_cookie): Update.
* config/arm/aarch-bti-insert.cc (call_needs_bti_j): New.
(rest_of_insert_bti): Use call_needs_bti_j.
* config/arm/aarch-common-protos.h
(aarch_fun_is_indirect_return): New.
* config/arm/arm.cc
(aarch_fun_is_indirect_return): New.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:11 +0000 (16:15 +0000)]
aarch64: Add GCS support to the unwinder
Follows the current linux ABI that uses single signal entry token
and shared shadow stack between thread and alt stack.
Could be behind __ARM_FEATURE_GCS_DEFAULT ifdef (only do anything
special with gcs compat codegen) but there is a runtime check anyway.
Change affected tests to be compatible with -mbranch-protection=standard
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:09 +0000 (16:15 +0000)]
aarch64: Add non-local goto and jump tests for GCS
These are scan asm tests only, relying on existing execution tests
for runtime coverage.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/gcs-nonlocal-1.c: New test.
* gcc.target/aarch64/gcs-nonlocal-1-track-speculation.c: New test.
* gcc.target/aarch64/gcs-nonlocal-2.c: New test.
* gcc.target/aarch64/gcs-nonlocal-2-track-speculation.c: New test.
* gcc.target/aarch64/gcs-nonlocal-1.h: New header file.
* gcc.target/aarch64/gcs-nonlocal-2.h: New header file.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:08 +0000 (16:15 +0000)]
aarch64: Add GCS support for nonlocal stack save
Nonlocal stack save and restore has to also save and restore the GCS
pointer. This is used in __builtin_setjmp/longjmp and nonlocal goto.
The GCS specific code is only emitted if GCS branch-protection is
enabled and the code always checks at runtime if GCS is enabled.
The new -mbranch-protection=gcs and old -mbranch-protection=none code
are ABI compatible: jmpbuf for __builtin_setjmp has space for 5
pointers, the layout is
old layout: fp, pc, sp, unused, unused
new layout: fp, pc, sp, gcsp, unused
Note: the ILP32 code generation is wrong as it saves the pointers with
Pmode (i.e. 8 bytes per pointer), but the user supplied buffer size is
for 5 pointers (4 bytes per pointer), this is not fixed.
The nonlocal goto has no ABI compatibility issues as the goto and its
destination are in the same translation unit.
We use CDImode to allow extra space for GCS without the effect of 16-byte
alignment.
gcc/ChangeLog:
* config/aarch64/aarch64.h (STACK_SAVEAREA_MODE): Make space for gcs.
* config/aarch64/aarch64.md (save_stack_nonlocal): New.
(restore_stack_nonlocal): New.
* tree-nested.cc (get_nl_goto_field): Updated.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:08 +0000 (16:15 +0000)]
aarch64: Add __builtin_aarch64_gcs* and __gcs* tests
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/gcs-1.c: New test.
* gcc.target/aarch64/gcspopm-1.c: New test.
* gcc.target/aarch64/gcspr-1.c: New test.
* gcc.target/aarch64/gcsss-1.c: New test.
The builtins are always enabled, but should be used behind runtime
checks in case the target does not support GCS. They are thin
wrappers around the corresponding instructions.
The GCS pointer is modelled with void * type (normal stores do not
work on GCS memory, but it is writable via the gcsss operation or
via GCSSTR if enabled so not const) and an entry on the GCS is
modelled with uint64_t (since it has fixed size and can be a token
that's not a pointer).
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:06 +0000 (16:15 +0000)]
aarch64: Add GCS instructions
Add instructions for the Guarded Control Stack extension.
GCSSS1 and GCSSS2 are always used together in the compiler and an extra
"mov xn, 0" should be always added before GCSSS2 to clear the output
register. This is needed to get reasonable result when GCS is disabled,
when these instructions are NOPs. Since the instructions are expected
to be used behind runtime feature checks, this is mainly relevant if
GCS can be disabled asynchronously.
GCSPOPM does not have embedded move and code code that emits this
instruction must first emit a zeroing of operand 1 to get a reasonable
result when GCS is not enabled.
The output of GCSPOPM is usually not needed, so a separate gcspopm_xzr
was added to model that. Did not do the same for GCSSS as it is a less
common operation.
The used mnemonics do not depend on updated assembler since these
instructions can be used without new -march setting behind a runtime
check.
Reading the GCSPR is modelled as unspec_volatile so it does not get
reordered wrt the other instructions changing the GCSPR.
Szabolcs Nagy [Thu, 14 Nov 2024 16:15:05 +0000 (16:15 +0000)]
aarch64: Add __builtin_aarch64_chkfeat
Builtin for chkfeat: the input argument is used to initialize x16 then
execute chkfeat and return the updated x16.
Note: the ACLE __chkfeat(x) will flip the bits to be more intuitive
(xor the input to output), but for the builtin that seems unnecessary
complication.
Jan Hubicka [Thu, 14 Nov 2024 16:01:12 +0000 (17:01 +0100)]
Remove allocations which are used only for NULL pointer check and free
Extend tree-ssa-dse to remove memory allocations that are used only
to check that return value is non-NULL and freed.
New -fmalloc-dce flag can be used to control malloc/free removal. I
ended up copying what -fallocation-dse does so -fmalloc-dce=1 enables
malloc/free removal provided return value is unused otherwise and
-fmalloc-dce=2 allows additional NULL pointer checks which it folds to
non-NULL direction.
I also added compensation for the gcc.dg/analyzer/pr101837.c testcase and
added testcase that std::nothrow variant of operator new is now optimized way.
With the -fmalloc-dce=n I can also add a level which emits runtime check for half
of address space and calloc overflow if it seems useful, but perhaps
incrementally. Adding size parameter tracking is not that hard (I posted WIP
patch for that).
gcc/ChangeLog:
PR tree-optimization/117370
* common.opt: Add -fmalloc-dce.
* common.opt.urls: Update.
* doc/invoke.texi: Document it; also add missing -flifetime-dse entry.
* tree-ssa-dce.cc (is_removable_allocation_p): Break out from
...
(mark_stmt_if_obviously_necessary): ... here; also check that
operator new satisfies gimple_call_from_new_or_delete.
(checks_return_value_of_removable_allocation_p): New Function.
(mark_all_reaching_defs_necessary_1): add missing case for
STRDUP and STRNDUP
(propagate_necessity): Use is_removable_allocation_p and
checks_return_value_of_removable_allocation_p.
(eliminate_unnecessary_stmts): Update conditionals that use
removed allocation; use is_removable_allocation_p.
Jonathan Wakely [Thu, 14 Nov 2024 09:58:41 +0000 (09:58 +0000)]
libstdc++: Use feature test macros consistently in <bits/stl_iterator.h>
Remove __cplusplus > 201703L checks that are redundant when used
alongside __glibcxx_concepts checks, because <version> already
guarantees that __glibcxx_concepts is only defined for C++20 and later.
Prefer to check __glibcxx_ranges for features such as move_sentinel that
were added by the One Ranges proposal (P0896R4), or for features which
depend on other components introduced by that proposal.
But prefer to check __glibcxx_concepts for constraints that only depend
on requires-clauses and concepts defined in <concepts>, even if those
constraints were added by the Ranges proposal (e.g. the constraints on
non-member operators for move_iterator).
Prefer #ifdef to #if when just testing for the presence of __glibcxx_foo
macros with caring about their value.
Also add/tweak some Doxygen comments.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h: Make use of feature test macros
more consistent. Improve doxygen comments.
Tobias Burnus [Thu, 14 Nov 2024 15:28:20 +0000 (16:28 +0100)]
libgomp.texi: Impl. Status - change TR13 to OpenMP 6.0 + fix routine typo
libgomp/
* libgomp.texi (OpenMP Implementation Status): Change TR13 to
OpenMP 6.0, now released. Fix a typo in the omp_target_memset_async
routine name.
Richard Biener [Thu, 14 Nov 2024 13:22:01 +0000 (14:22 +0100)]
Fix another thinko in peeling for gap compute of get_group_load_store_type
There's inconsistent handling of the cpart_size == cnunits which
currently avoids reporting peeling for gaps being insufficient, but
the following condition which is enough to trigger it,
cremain + group_size < cpart_size with cpart_size == cnunits is
equal to the condition that brings us here in the first place,
maybe_lt (remain + group_size, nunits). The following fixes this
by not checking cpart_size against special values.
* tree-vect-stmts.cc (get_group_load_store_type): Do not
exempt cpart_size == cnunits from failing.
Steve Baird [Wed, 30 Oct 2024 23:20:51 +0000 (16:20 -0700)]
ada: Improved legality checking for deep delta aggregates.
Enforce deep delta legality rules about nonoverlapping choices. For example,
do not allow both Aaa.Bbb and Aaa.Bbb.Ccc as choices in one delta aggregate.
One special case impacts "regular" Ada2022 delta aggregates - the rule
preventing a record component from occurring twice as a choice in a delta
aggregate was previously not being enforced.
gcc/ada/ChangeLog:
* sem_aggr.adb (Resolve_Delta_Aggregate): The rule about
discriminant dependent component references in choices applies to
both array and record delta aggregates, so check for violations in
Resolve_Delta_Aggregate. Call a new procedure,
Check_For_Bad_Dd_Component_Choice, for each choice.
(Resolve_Delta_Record_Aggregate): Call a new procedure,
Check_For_Bad_Overlap, for each pair of choices.
Eric Botcazou [Fri, 1 Nov 2024 19:47:57 +0000 (20:47 +0100)]
ada: Fix spurious warning on representation clause for private discriminated type
This is the warning enabled by -gnatw.h for holes in record types that are
declared with a representation clause for their components.
When a discriminated type has a private declaration that also declares its
discriminants, the sibling discriminants present on the full declaration
are essentially ignored and, therefore, cannot be used in the computation
performed to give the warning.
gcc/ada/ChangeLog:
* sem_ch13.adb (Record_Hole_Check): Deal consistently with the base
type throughout the processing. Return if its declaration is not a
full type declaration. Assert that its record definition is either
a derived type definition or a record definition. If the type has a
private declaration that does not specify unknown discriminants, use
it as the source of discriminant specifications, if any.
(Check_Component_List): Process every N_Discriminant_Specification
but assert that its defining identifier is really a discriminant.
Eric Botcazou [Mon, 28 Oct 2024 10:19:10 +0000 (11:19 +0100)]
ada: Fix internal error on misplaced iterated component association
This happens for example in the others choice of a 1-dimensional array:
A : array (1 .. Length) of Integer
:= (others => for Each in 1 .. Length => Each);
or when it is used without parentheses for the array component of a record.
gcc/ada/ChangeLog:
PR ada/112524
PR ada/113781
* par-ch4.adb (P_Primary) <Tok_For>: Give an error about missing
parentheses in the (purported) iterated component case too.
(P_Unparen_Cond_Expr_Etc): Likewise.
* sem.adb (Analyze): Raise PE on N_Iterated_Component_Association.
* sem_util.ads (Diagnose_Iterated_Component_Association): Delete.
* sem_util.adb (Diagnose_Iterated_Component_Association): Likewise.
Martin Jambor [Thu, 14 Nov 2024 13:42:27 +0000 (14:42 +0100)]
ipa: Introduce a one jump function dumping function
I plan to introduce a verifier that prints a single jump function when
it fails with the function introduced in this one. Because it is a
verifier, the risk that it would need to e reverted are non-zero and
because the function can be useful on its own, this is a special patch
to introduce it.
gcc/ChangeLog:
2024-11-01 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_dump_jump_function): Declare.
* ipa-prop.cc (ipa_dump_jump_function): New function.
(ipa_print_node_jump_functions_for_edge): Move printing of
individual jump functions to the new function.
Martin Jambor [Thu, 14 Nov 2024 13:42:27 +0000 (14:42 +0100)]
ipa-cp: Fix constant dumping
Commit gcc-14-5368-ge0787da2633 removed an overloaded variant of
function print_ipcp_constant_value for tree constants. That did not
break build because the other overloaded variant for polymorphic
contexts-has a parameter which is constructible from a tree, but it
prints polymorphic contexts, not tree constants, so we in dumps we got
things like:
This commit re-adds the needed overloaded variant though it uses the
printing function added in the aforementioned commit instead of
printing it itself.
gcc/ChangeLog:
2024-11-13 Martin Jambor <mjambor@suse.cz>
* ipa-prop.h (ipa_print_constant_value): Declare.
* ipa-prop.cc (ipa_print_constant_value): Make public.
* ipa-cp.cc (print_ipcp_constant_value): Re-add this overloaded
function for printing tree constants.
gcc/testsuite/ChangeLog:
2024-11-14 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/ipcp-agg-1.c: Add a scan dump for a constant value in
the latice dump.
* g++.dg/tree-ssa/pr96945.C: cleanup
* g++.dg/tree-ssa/pr110819.C: New test.
* g++.dg/tree-ssa/pr116868.C: New test.
* g++.dg/tree-ssa/pr58483.C: New test.
Jonathan Wakely [Thu, 14 Nov 2024 01:14:44 +0000 (01:14 +0000)]
libstdc++: Add missing parts of LWG 3480 for directory iterators [PR117560]
It looks like I only read half the resolution of LWG 3480 and decided we
already supported it. As well as making the non-member overloads of end
take their parameters by value, we need some specializations of the
enable_borrowed_range and enable_view variable templates.
libstdc++-v3/ChangeLog:
PR libstdc++/117560
* include/bits/fs_dir.h (enable_borrowed_range, enable_view):
Define specializations for directory iterators, as per LWG 3480.
* testsuite/27_io/filesystem/iterators/lwg3480.cc: New test.
Richard Biener [Tue, 12 Nov 2024 12:55:14 +0000 (13:55 +0100)]
Remove last comparison-code expand_vec_cond_expr_p call from vectorizer
The following refactors the check with the last remaininig
expand_vec_cond_expr_p call with a comparison code to make it
obvious we are not relying on those anymore.
* tree-vect-stmts.cc (vectorizable_condition): Refactor
target support check.
Richard Biener [Thu, 14 Nov 2024 09:17:23 +0000 (10:17 +0100)]
tree-optimization/117567 - make SLP reassoc resilent against NULL lanes
The following tries to make the SLP chain association code resilent
against not present lanes (the other option would have been to disable
it in this case). Not present lanes can now more widely appear as
part of mask load SLP discovery when there is gaps involved. Requiring
a present first lane shouldn't be a restriction since in unpermuted
state all DR groups have their first lane not a gap.
PR tree-optimization/117567
* tree-vect-slp.cc (vect_build_slp_tree_2): Handle not present
lanes when doing re-association.
Christophe Lyon [Wed, 13 Nov 2024 21:20:13 +0000 (21:20 +0000)]
libgcc: Fix COPY_ARG_VAL initializer (PR 117537)
We recently forced -Werror when building libgcc for aarch64, to make
sure we'd catch and fix the kind of problem described in the PR.
In this case, when building for aarch64_be (so, big endian), gcc emits
this warning/error:
libgcc/config/libbid/bid_conf.h:847:25: error: missing braces around initializer [-Werror=missing-braces]
847 | UINT128 arg_name={ bid_##arg_name.w[1], bid_##arg_name.w[0]};
libgcc/config/libbid/bid_conf.h:871:8: note: in expansion of macro 'COPY_ARG_VAL'
871 | COPY_ARG_VAL(arg_name)
This patch fixes the problem by adding curly braces around the
initializer for COPY_ARG_VAL in the big endian case.
It seems that COPY_ARG_REF (just above COPY_ARG_VAL) has a similar
issue, but DECIMAL_CALL_BY_REFERENCE seems always defined to 0, so
COPY_ARG_REF is never used. The patch fixes it too, though.
Andrew Pinski [Wed, 13 Nov 2024 06:13:35 +0000 (22:13 -0800)]
cfgexpand: Skip doing conflicts if there is only 1 variable
This is a small speed up. If there is only one know stack variable, there
is no reason figure out the scope conflicts as there are none. So don't
go through all the live range calculations just to see there are none.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* cfgexpand.cc (add_scope_conflicts): Return right away
if there are only one stack variable.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Eikansh Gupta [Mon, 11 Nov 2024 11:36:04 +0000 (17:06 +0530)]
MATCH: Simplify `a rrotate (32-b) -> a lrotate b` [PR109906]
The pattern `a rrotate (32-b)` should be optimized to `a lrotate b`.
The same is also true for `a lrotate (32-b)`. It can be optimized to
`a rrotate b`.
This patch adds following patterns:
a rrotate (32-b) -> a lrotate b
a lrotate (32-b) -> a rrotate b
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/109906
gcc/ChangeLog:
* match.pd (a rrotate (32-b) -> a lrotate b): New pattern
(a lrotate (32-b) -> a rrotate b): New pattern
Richard Biener [Wed, 13 Nov 2024 10:32:13 +0000 (11:32 +0100)]
Do not consider overrun for VMAT_ELEMENTWISE
When we classify an SLP access as VMAT_ELEMENTWISE we still consider
overrun - the reset of it is later overwritten. The following fixes
this, resolving a few RISC-V FAILs with --param vect-force-slp=1.
* tree-vect-stmts.cc (get_group_load_store_type): For
VMAT_ELEMENTWISE there's no overrun.
In addition to a single DR we also require a single lane, not a splat.
PR tree-optimization/117554
* tree-vect-stmts.cc (get_group_load_store_type): We can
use gather/scatter only for a single-lane single element group
access.
Richard Biener [Wed, 13 Nov 2024 12:56:13 +0000 (13:56 +0100)]
tree-optimization/117559 - avoid hybrid SLP for masked load/store lanes
Hybrid analysis is confused by the mask_conversion pattern making a
uniform mask non-uniform. As load/store lanes only uses a single
lane to mask all data lanes the SLP graph doesn't cover the alternate
(redundant) mask lanes and thus their pattern defs. The following adds
a hack to mark them covered.
Fixes gcc.target/aarch64/sve/mask_struct_store_?.c with forced SLP.
PR tree-optimization/117559
* tree-vect-slp.cc (vect_mark_slp_stmts): Pass in vinfo,
mark all mask defs of a load/store-lane .MASK_LOAD/STORE
as pure.
(vect_make_slp_decision): Adjust.
(vect_slp_analyze_bb_1): Likewise.
Richard Biener [Wed, 13 Nov 2024 13:43:27 +0000 (14:43 +0100)]
tree-optimization/117556 - SLP of live stmts from load-lanes
The following fixes SLP live lane generation for load-lanes which
fails to analyze for gcc.dg/vect/vect-live-slp-3.c because the
VLA division doesn't work out but it would also wrongly use the
transposed vector defs I think. The following properly disables
the actual load-lanes SLP node from live lane processing and instead
relies on the SLP permute node representing the live lane where we
can use extract-last to extract the last lane. This also fixes
the reported Ada miscompile.
PR tree-optimization/117556
PR tree-optimization/117553
* tree-vect-stmts.cc (vect_analyze_stmt): Do not analyze
the SLP load-lanes node for live lanes, but only the
permute node.
(vect_transform_stmt): Likewise for the transform.
* gcc.dg/vect/vect-live-slp-3.c: Expect us to SLP even for
VLA vectors (in single-lane mode).
Pan Li [Thu, 14 Nov 2024 06:16:15 +0000 (14:16 +0800)]
RISC-V: Rearrange the test files for scalar SAT_ADD [NFC]
The test files of scalar SAT_ADD only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}. For example,
test form 3 for uint32_t SAT_ADD will have -3-u32.c for asm check and
-run-3-u32.c for the run test.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
For cstorebf4 it uses comparison_operator for BFmode compare, which is
incorrect when directly uses ix86_expand_setcc as it does not canonicalize
the input comparison to correct the compare code by swapping operands.
The original code without AVX10.2 calls emit_store_flag_force, who
actually calls to emit_store_flags_1 and recurisive calls to this expander
again with swapped operand and flag.
Therefore, we can avoid do the redundant recurisive call by adjusting
the comparison_operator to ix86_fp_comparison_operator, and calls
ix86_expand_setcc directly.
gcc/ChangeLog:
PR target/117495
* config/i386/i386.md (cstorebf4): Use ix86_fp_comparison_operator
and calls ix86_expand_setcc directly.
gcc/testsuite/ChangeLog:
PR target/117495
* gcc.target/i386/pr117495.c: New test.
Jonathan Wakely [Thu, 31 Oct 2024 16:29:18 +0000 (16:29 +0000)]
libstdc++: Refactor std::hash specializations
This attempts to simplify and clean up our std::hash code. The primary
benefit is improved diagnostics for users when they do something wrong
involving std::hash or unordered containers. An additional benefit is
that for the unstable ABI (--enable-symvers=gnu-versioned-namespace) we
can reduce the memory footprint of several std::hash specializations.
In the current design, __hash_enum is a base class of the std::hash
primary template, but the partial specialization of __hash_enum for
non-enum types is disabled. This means that if a user forgets to
specialize std::hash for their class type (or forgets to use a custom
hash function for unordered containers) they get error messages about
std::__hash_enum not being constructible. This is confusing when there
is no enum type involved: why should users care about __hash_enum not
being constructible if they're not trying to hash enums?
This change makes the std::hash primary template only derive from
__hash_enum when the template argument type is an enum. Otherwise, it
derives directly from a new class template, __hash_not_enabled. This new
class template defines the deleted members that cause a given std::hash
specialization to be a disabled specialization (as per P0513R0). Now
when users try to use a disabled specialization, they get more
descriptive errors that mention __hash_not_enabled instead of
__hash_enum.
Additionally, adjust __hash_base to remove the deprecated result_type
and argument_type typedefs for C++20 and later.
In the current code we use a __poison_hash base class in the std::hash
specializations for std::unique_ptr, std::optional, and std::variant.
The primary template of __poison_hash has deleted special members, which
is used to conditionally disable the derived std::hash specialization.
This can also result in confusing diagnostics, because seeing "poison"
in an enabled specialization is misleading. Only some uses of
__poison_hash actually "poison" anything, i.e. cause a specialization to
be disabled. In other cases it's just an empty base class that does
nothing.
This change removes __poison_hash and changes the std::hash
specializations that were using it to conditionally derive from
__hash_not_enabled instead. When the std::hash specialization is
enabled, there is no more __poison_hash base class. However, to preserve
the ABI properties of those std::hash specializations, we need to
replace __poison_hash with some other empty base class. This is needed
because in the current code std::hash<std::variant<int, const int>> has
two __poison_hash<int> base classes, which must have unique addresses,
so sizeof(std::hash<std::variant<int, const int>>) == 2. To preserve
this unfortunate property, a new __hash_empty_base class is used as a
base class to re-introduce du0plicate base classes that increase the
class size. For the unstable ABI we don't use __hash_empty_base so the
std::hash<std::variant<T...>> specializations are always size 1, and
the class hierarchy is much simpler so will compile faster.
Additionally, remove the result_type and argument_type typedefs from all
disabled specializations of std::hash for std::unique_ptr,
std::optional, and std::variant. Those typedefs are useless for disabled
specializations, and although the standard doesn't say they must *not*
be present for disabled specializations, it certainly only requires them
for enabled specializations. Finally, for C++20 the typedefs are also
removed from enabled specializations of std::hash for std::unique_ptr,
std::optional, and std::variant.
libstdc++-v3/ChangeLog:
* doc/xml/manual/evolution.xml: Document removal of nested types
from std::hash specializations.
* doc/html/manual/api.html: Regenerate.
* include/bits/functional_hash.h (__hash_base): Remove
deprecated nested types for C++20.
(__hash_empty_base): Define new class template.
(__is_hash_enabled_for): Define new variable template.
(__poison_hash): Remove.
(__hash_not_enabled): Define new class template.
(__hash_enum): Remove partial specialization for non-enums.
(hash): Derive from __hash_not_enabled for non-enums, instead of
__hash_enum.
* include/bits/unique_ptr.h (__uniq_ptr_hash): Derive from
__hash_base. Conditionally derive from __hash_empty_base.
(__uniq_ptr_hash<>): Remove disabled specialization.
(hash): Do not derive from __hash_base unconditionally.
Conditionally derive from either __uniq_ptr_hash or
__hash_not_enabled.
* include/std/optional (__optional_hash_call_base): Remove.
(__optional_hash): Define new class template.
(hash): Derive from either
(hash): Conditionally derive from either __optional_hash or
__hash_not_enabled. Remove nested typedefs.
* include/std/variant (_Base_dedup): Replace __poison_hash with
__hash_empty_base.
(__variant_hash_call_base_impl): Remove.
(__variant_hash): Define new class template.
(hash): Conditionally derive from either __variant_hash or
__hash_not_enabled. Remove nested typedefs.
* testsuite/20_util/optional/hash.cc: Check whether nested types
are present.
* testsuite/20_util/variant/hash.cc: Likewise.
* testsuite/20_util/optional/hash_abi.cc: New test.
* testsuite/20_util/unique_ptr/hash/abi.cc: New test.
* testsuite/20_util/unique_ptr/hash/types.cc: New test.
* testsuite/20_util/variant/hash_abi.cc: New test.
We have two overloads of _M_find_before_node but they have quite
different performance characteristics, which isn't necessarily obvious.
The original version, _M_find_before_node(bucket, key, hash_code), looks
only in the specified bucket, doing a linear search within that bucket
for an element that compares equal to the key. This is the typical fast
lookup for hash containers, assuming the load factor is low so that each
bucket isn't too large.
The newer _M_find_before_node(key) was added in r12-6272-ge3ef832a9e8d6a
and could be naively assumed to calculate the hash code and bucket for
key and then call the efficient _M_find_before_node(bkt, key, code)
function. But in fact it does a linear search of the entire container.
This is potentially very slow and should only be used for a suitably
small container, as determined by the __small_size_threshold() function.
We don't even have a comment pointing out this O(N) performance of the
newer overload.
Additionally, the newer overload is only ever used in exactly one place,
which would suggest it could just be removed. However there are several
places that do the linear search of the whole container with an explicit
loop each time.
This adds a new member function, _M_locate, and uses it to replace most
uses of _M_find_node and the loops doing linear searches. This new
member function does both forms of lookup, the linear search for small
sizes and the _M_find_node(bkt, key, code) lookup within a single
bucket. The new function returns a __location_type which is a struct
that contains a pointer to the first node matching the key (if such a
node is present), or the hash code and bucket index for the key. The
hash code and bucket index allow the caller to know where a new node
with that key should be inserted, for the cases where the lookup didn't
find a matching node.
The result struct actually contains a pointer to the node *before* the
one that was located, as that is needed for it to be useful in erase and
extract members. There is a member function that returns the found node,
i.e. _M_before->_M_nxt downcast to __node_ptr, which should be used in
most cases.
This new function greatly simplifies the functions that currently have
to do two kinds of lookup and explicitly check the current size against
the small size threshold.
Additionally, now that try_emplace is defined directly in _Hashtable
(not in _Insert_base) we can use _M_locate in there too, to speed up
some try_emplace calls. Previously it did not do the small-size linear
search.
It would be possible to add a function to get a __location_type from an
iterator, and then rewrite some functions like _M_erase and
_M_extract_node to take a __location_type parameter. While that might be
conceptually nice, it wouldn't really make the code any simpler or more
readable than it is now. That isn't done in this change.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (__location_type): New struct.
(_M_locate): New member function.
(_M_find_before_node(const key_type&)): Remove.
(_M_find_node): Move variable initialization into condition.
(_M_find_node_tr): Likewise.
(operator=(initializer_list<T>), try_emplace, _M_reinsert_node)
(_M_merge_unique, find, erase(const key_type&)): Use _M_locate
for lookup.
Jonathan Wakely [Thu, 7 Nov 2024 17:04:49 +0000 (17:04 +0000)]
libstdc++: Simplify _Hashtable merge functions
I realised that _M_merge_unique and _M_merge_multi call extract(iter)
which then has to call _M_get_previous_node to iterate through the
bucket to find the node before the one iter points to. Since the merge
function is already iterating over the entire container, we had the
previous node a moment ago. Walking the whole bucket to find it again is
wasteful. We could just rewrite the loop in terms of node pointers
instead of iterators, and then call _M_extract_node directly. However,
this is only possible when the source container is the same type as the
destination, because otherwise we can't access the source's private
members (_M_before_begin, _M_begin, _M_extract_node etc.)
Add overloads of _M_merge_unique and _M_merge_multi that work with
source containers of the same type, to enable this optimization.
For both overloads of _M_merge_unique we can also remove the conditional
modifications to __n_elt and just consistently decrement it for every
element processed. Use a multiplier of one or zero that dictates whether
__n_elt is passed to _M_insert_unique_node or not. We can also remove
the repeated calls to size() and just keep track of the size in a local
variable.
Although _M_merge_unique and _M_merge_multi should be safe for
"self-merge", i.e. when doing c.merge(c), it's wasteful to search/insert
every element when we don't need to do anything. Add 'this == &source'
checks to the overloads taking an lvalue of the container's own type.
Because those checks aren't needed for the rvalue overloads, change
those to call the underlying _M_merge_xxx function directly instead of
going through the lvalue overload that checks the address.
I've also added more extensive tests for better coverage of the new
overloads added in this commit.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_M_merge_unique): Add overload for
merging from same type.
(_M_merge_unique<Compatible>): Simplify size tracking. Add
comment.
(_M_merge_multi): Add overload for merging from same type.
(_M_merge_multi<Compatible>): Add comment.
* include/bits/unordered_map.h (unordered_map::merge): Check for
self-merge in the lvalue overload. Call _M_merge_unique directly
for the rvalue overload.
(unordered_multimap::merge): Likewise.
* include/bits/unordered_set.h (unordered_set::merge): Likewise.
(unordered_multiset::merge): Likewise.
* testsuite/23_containers/unordered_map/modifiers/merge.cc:
Add more tests.
* testsuite/23_containers/unordered_multimap/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_multiset/modifiers/merge.cc:
Likewise.
* testsuite/23_containers/unordered_set/modifiers/merge.cc:
Likewise.
Jonathan Wakely [Tue, 5 Nov 2024 22:39:43 +0000 (22:39 +0000)]
libstdc++: Remove _Equality base class from _Hashtable
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable): Remove _Equality base
class.
(_Hashtable::_M_equal): Define equality comparison here instead
of in _Equality::_M_equal.
* include/bits/hashtable_policy.h (_Equality): Remove.
Jonathan Wakely [Fri, 1 Nov 2024 23:53:52 +0000 (23:53 +0000)]
libstdc++: Remove _Insert base class from _Hashtable
There's no reason to have a separate base class defining the insert
member functions now. They can all be moved into the _Hashtable class,
which simplifies them slightly.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable): Remove inheritance from
__detail::_Insert and move its members into _Hashtable.
* include/bits/hashtable_policy.h (__detail::_Insert): Remove.
Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
Jonathan Wakely [Fri, 1 Nov 2024 21:52:37 +0000 (21:52 +0000)]
libstdc++: Replace _Hashtable::__fwd_value_for with cast
We can just use a cast to the appropriate type instead of calling a
function to do it. This gives the compiler less work to compile and
optimize, and at -O0 avoids a function call per element.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable::__fwd_value_for):
Remove.
(_Hashtable::_M_assign): Use static_cast instead of
__fwd_value_for.
Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
Jonathan Wakely [Fri, 1 Nov 2024 14:17:38 +0000 (14:17 +0000)]
libstdc++: Add _Hashtable::_M_assign for the common case
This adds a convenient _M_assign overload for the common case where the
node generator is the _AllocNode type. Only two places need to call
_M_assign with a _ReuseOrAllocNode node generator, so all the other
calls to _M_assign can use the new overload instead of manually
constructing a node generator.
The _AllocNode::operator(Args&&...) function doesn't need to be a
variadic template. It is only ever called with a single argument of type
const value_type& or value_type&&, so could be simplified. That isn't
done in this commit.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable): Remove typedefs for
node generators.
(_Hashtable::_M_assign(_Ht&&)): Add new overload.
(_Hashtable::operator=(initializer_list<value_type>)): Add local
typedef for node generator.
(_Hashtable::_M_assign_elements): Likewise.
(_Hashtable::operator=(const _Hashtable&)): Use new _M_assign
overload.
(_Hashtable(const _Hashtable&)): Likewise.
(_Hashtable(const _Hashtable&, const allocator_type&)):
Likewise.
(_Hashtable(_Hashtable&&, __node_alloc_type&&, false_type)):
Likewise.
* include/bits/hashtable_policy.h (_Insert): Remove typedef for
node generator.
Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
Jonathan Wakely [Thu, 31 Oct 2024 19:53:55 +0000 (19:53 +0000)]
libstdc++: Refactor Hashtable erasure
This reworks the internal member functions for erasure from
unordered containers, similarly to the earlier commit doing it for
insertion.
Instead of multiple overloads of _M_erase which are selected via tag
dispatching, the erase(const key_type&) member can use 'if constexpr' to
choose an appropriate implementation (returning after erasing a single
element for unique keys, or continuing to erase all equivalent elements
for non-unique keys).
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (_Hashtable::_M_erase): Remove
overloads for erasing by key, moving logic to ...
(_Hashtable::erase): ... here.
Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
This completely reworks the internal member functions for insertion into
unordered containers. Currently we use a mixture of tag dispatching (for
unique vs non-unique keys) and template specialization (for maps vs
sets) to correctly implement insert and emplace members.
This removes a lot of complexity and indirection by using 'if constexpr'
to select the appropriate member function to call.
Previously there were four overloads of _M_emplace, for unique keys and
non-unique keys, and for hinted insertion and non-hinted. However two of
those were redundant, because we always ignore the hint for unique keys
and always use a hint for non-unique keys. Those four overloads have
been replaced by two new non-overloaded function templates:
_M_emplace_uniq and _M_emplace_multi. The former is for unique keys and
doesn't take a hint, and the latter is for non-unique keys and takes a
hint.
In the body of _M_emplace_uniq there are special cases to handle
emplacing values from which a key_type can be extracted directly. This
means we don't need to allocate a node and construct a value_type that
might be discarded if an equivalent key is already present. The special
case applies when emplacing the key_type into std::unordered_set, or
when emplacing std::pair<cv key_type, X> into std::unordered_map, or
when emplacing two values into std::unordered_map where the first has
type cv key_type. For the std::unordered_set case, obviously if we're
inserting something that's already the key_type, we can look it up
directly. For the std::unordered_map cases, we know that the inserted
std::pair<const key_type, mapped_type> would have its first element
initialized from first member of a std::pair value, or from the first of
two values, so if that is a key_type, we can look that up directly.
All the _M_insert overloads used a node generator parameter, but apart
from the one case where _M_insert_range was called from
_Hashtable::operator=(initializer_list<value_type>), that parameter was
always the _AllocNode type, never the _ReuseOrAllocNode type. Because
operator=(initializer_list<value_type>) was rewritten in an earlier
commit, all calls to _M_insert now use _AllocNode, so there's no reason
to pass the generator as a template parameter when inserting.
The multiple overloads of _Hashtable::_M_insert can all be removed now,
because the _Insert_base::insert members now call either _M_emplace_uniq
or _M_emplace_multi directly, only passing a hint to the latter. Which
one to call is decided using 'if constexpr (__unique_keys::value)' so
there is no unnecessary code instantiation, and overload resolution is
much simpler.
The partial specializations of the _Insert class template can be
entirely removed, moving the minor differences in 'insert' member
functions into the common _Insert_base base class. The different
behaviour for maps and sets can be implemented using enable_if
constraints and 'if constexpr'. With the _Insert class template no
longer needed, the _Insert_base class template can be renamed to
_Insert. This is a minor simplification for the complex inheritance
hierarchy used by _Hashtable, removing one base class. It also means
one less class template instantiation, and no need to match the right
partial specialization of _Insert. The _Insert base class could be
removed entirely by moving all its 'insert' members into _Hashtable,
because without any variation in specializations of _Insert there is no
reason to use a base class to define those members. That is left for a
later commit.
Consistently using _M_emplace_uniq or _M_emplace_multi for insertion
means we no longer attempt to avoid constructing a value_type object to
find its key, removing the PR libstdc++/96088 optimizations. This fixes
the bugs caused by those optimizations, such as PR libstdc++/115285, but
causes regressions in the expected number of allocations and temporary
objects constructed for the PR 96088 tests. It should be noted that the
"regressions" in the 96088 tests put us exactly level with the number of
allocations done by libc++ for those same tests.
To mitigate this to some extent, _M_emplace_uniq detects when the
emplace arguments already contain a key_type (either as the sole
argument, for unordered_set, or as the first part of a pair of
arguments, for unordered_map). In that specific case we don't need to
allocate a node and construct a value type to check for an existing
element with equivalent key.
The remaining regressions in the number of allocations and temporaries
should be addressed separately, with more conservative optimizations
specific to std::string. That is not part of this commit.
libstdc++-v3/ChangeLog:
PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_M_emplace): Replace
with _M_emplace_uniq and _M_emplace_multi.
(_Hashtable::_S_forward_key, _Hashtable::_M_insert_unique)
(_Hashtable::_M_insert_unique_aux, _Hashtable::_M_insert):
Remove.
* include/bits/hashtable_policy.h (_ConvertToValueType):
Remove.
(_Insert_base::_M_insert_range): Remove overload for unique keys
and rename overload for non-unique keys to ...
(_Insert_base::_M_insert_range_multi): ... this.
(_Insert_base::insert): Call _M_emplace_uniq or _M_emplace_multi
instead of _M_insert. Add insert overloads from _Insert.
(_Insert_base): Rename to _Insert.
(_Insert): Remove
* testsuite/23_containers/unordered_map/96088.cc: Adjust
expected number of allocations.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.
Jonathan Wakely [Fri, 1 Nov 2024 12:49:53 +0000 (12:49 +0000)]
libstdc++: Allow unordered_set assignment to assign to existing nodes
Currently the _ReuseOrAllocNode::operator(Args&&...) function always
destroys the value stored in recycled nodes and constructs a new value.
The _ReuseOrAllocNode type is only ever used for implementing
assignment, either from another unordered container of the same type, or
from std::initializer_list<value_type>. Consequently, the parameter pack
Args only ever consists of a single parameter or type const value_type&
or value_type. We can replace the variadic parameter pack with a single
forwarding reference parameter, and when the value_type is assignable
from that type we can use assignment instead of destroying the existing
value and then constructing a new one.
Using assignment is typically only possible for sets, because for maps
the value_type is std::pair<const key_type, mapped_type> and in most
cases std::is_assignable_v<const key_type&, const key_type&> is false.
libstdc++-v3/ChangeLog:
* include/bits/hashtable_policy.h (_ReuseOrAllocNode::operator()):
Replace parameter pack with a single parameter. Assign to
existing value when possible.
* testsuite/23_containers/unordered_multiset/allocator/move_assign.cc:
Adjust expected count of operations.
* testsuite/23_containers/unordered_set/allocator/move_assign.cc:
Likewise.
Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
This replaces a call to _M_insert_range with open coding the loop. This
will allow removing the node generator parameter from _M_insert_range in
a later commit.
libstdc++-v3/ChangeLog:
* include/bits/hashtable.h (operator=(initializer_list)):
Refactor to not use _M_insert_range.
Reviewed-by: François Dumont <fdumont@gcc.gnu.org>
Jonathan Wakely [Wed, 13 Nov 2024 16:39:50 +0000 (16:39 +0000)]
libstdc++: Write timestamp to libstdc++-performance.sum file
The results of 'make check-performance' are appended to the .sum file,
with no indication where one set of results ends and the next begins. We
could just remove the file when starting a new run, but appending makes
it a little easier to compare with previous runs, without having to copy
and store old files.
This adds a header containing a timestamp to the file when starting a
new run.
libstdc++-v3/ChangeLog:
* scripts/check_performance: Add timestamp to output file at
start of run.
Jonathan Wakely [Wed, 13 Nov 2024 12:57:11 +0000 (12:57 +0000)]
libstdc++: Fix nodiscard warnings in perf test for memory pools
The use of unnamed std::lock_guard temporaries was intentional here, as
they were used like barriers (but std::barrier isn't available until
C++20). But that gives nodiscard warnings, because unnamed temporary
locks are usually unintentional. Use named variables in new block scopes
instead.
libstdc++-v3/ChangeLog:
* testsuite/performance/20_util/memory_resource/pools.cc: Fix
-Wunused-value warnings about unnamed std::lock_guard objects.
There are some SVE intrinsics that support one set of suffixes for
one extension (E1, say) and another set of suffixes for another
extension (E2, say). It is usually the case that, mutatis mutandis,
E2 extends E1. Listing E1 first would then ensure that the manual
C overload would also require E1, making it suitable for resolving
both the E1 forms and, where appropriate, the E2 forms.
However, there was one exception: the I8MM, F32MM, and F64MM extensions
to SVE each added variants of svmmla, but there was no svmmla for SVE
itself. This was handled by adding an SVE entry for svmmla that only
defined the C overload; it had no variants of its own.
This situation occurs more often with upcoming patches. Rather than
keep adding these dummy entries, it seemed better to make the code
automatically compute the lowest common denominator for all definitions
that share the same C overload.
gcc/
* config/aarch64/aarch64-protos.h
(aarch64_required_extensions::common_denominator): New member
function.
* config/aarch64/aarch64-sve-builtins-base.def: Remove zero-variant
entry for mmla.
* config/aarch64/aarch64-sve-builtins-shapes.cc (mmla_def): Remove
support for it.
* config/aarch64/aarch64-sve-builtins.cc
(function_builder::add_overloaded): Relax the assert for duplicate
definitions and instead calculate the common denominator of all
requirements.
Filip Kastl [Wed, 13 Nov 2024 15:11:14 +0000 (16:11 +0100)]
i386: Add -mveclibabi=aocl [PR56504]
We currently support generating vectorized math calls to the AMD core
math library (ACML) (-mveclibabi=acml). That library is end-of-life and
its successor is the math library from AMD Optimizing CPU Libraries
(AOCL).
This patch adds support for AOCL (-mveclibabi=aocl). That significantly
broadens the range of vectorized math functions optimized for AMD CPUs
that GCC can generate calls to.
See the edit to invoke.texi for a complete list of added functions.
Compared to the list of functions in AOCL LibM docs I left out these
vectorized function families:
- sincos and all functions working with arrays ... Because these
functions have pointer arguments and that would require a bigger
rework of ix86_veclibabi_aocl(). Also, I'm not sure if GCC even ever
generates calls to these functions.
- linearfrac ... Because these functions are specific to the AMD
library. There's no equivalent glibc function nor GCC internal
function nor GCC built-in.
- powx, sqrt, fabs ... Because GCC doesn't vectorize these functions
into calls and uses instructions instead.
I also left amd_vrd2_expm1() (the AMD docs list the function but I
wasn't able to link calls to it with the current version of the
library).
gcc/ChangeLog:
PR target/56504
* config/i386/i386-options.cc (ix86_option_override_internal):
Add ix86_veclibabi_type_aocl case.
* config/i386/i386-options.h (ix86_veclibabi_aocl): Add extern
ix86_veclibabi_aocl().
* config/i386/i386-opts.h (enum ix86_veclibabi): Add
ix86_veclibabi_type_aocl into the ix86_veclibabi enum.
* config/i386/i386.cc (ix86_veclibabi_aocl): New function.
* config/i386/i386.opt: Add the 'aocl' type.
* doc/invoke.texi: Document -mveclibabi=aocl.
gcc/testsuite/ChangeLog:
PR target/56504
* gcc.target/i386/vectorize-aocl1.c: New test.
This is especially helpful for SVE architectures as LDEXP calls can be
implemented using the FSCALE instruction, as seen in the following patch:
https://gcc.gnu.org/g:9b2915d95d855333d4d8f66b71a75f653ee0d076
SPEC2017 was run with this patch, while there are no noticeable improvements,
there are no non-noise regressions either.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
gcc/ChangeLog:
PR target/57492
* match.pd: Added patterns to fold calls to pow to ldexp and optimize
specific ldexp calls.
gcc/testsuite/ChangeLog:
PR target/57492
* gcc.dg/tree-ssa/ldexp.c: New test.
* gcc.dg/tree-ssa/pow-to-ldexp.c: New test.
Yangyu Chen [Tue, 5 Nov 2024 03:23:16 +0000 (11:23 +0800)]
RISC-V: Add Multi-Versioning Test Cases
This patch adds test cases for the Function Multi-Versioning (FMV)
feature for RISC-V, which reuses the existing test cases from the
aarch64 and ported them to RISC-V.
* g++.target/riscv/mv-symbols1.C: New test.
* g++.target/riscv/mv-symbols2.C: New test.
* g++.target/riscv/mv-symbols3.C: New test.
* g++.target/riscv/mv-symbols4.C: New test.
* g++.target/riscv/mv-symbols5.C: New test.
* g++.target/riscv/mvc-symbols1.C: New test.
* g++.target/riscv/mvc-symbols2.C: New test.
* g++.target/riscv/mvc-symbols3.C: New test.
* g++.target/riscv/mvc-symbols4.C: New test.
Yangyu Chen [Tue, 5 Nov 2024 03:23:07 +0000 (11:23 +0800)]
RISC-V: Implement TARGET_GENERATE_VERSION_DISPATCHER_BODY and TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
This patch implements the TARGET_GENERATE_VERSION_DISPATCHER_BODY and
TARGET_GET_FUNCTION_VERSIONS_DISPATCHER for RISC-V. This is used to
generate the dispatcher function and get the dispatcher function for
function multiversioning.
This patch copies many codes from commit 0cfde688e213 ("[aarch64]
Add function multiversioning support") and modifies them to fit the
RISC-V port. A key difference is the data structure of feature bits in
RISC-V C-API is a array of unsigned long long, while in AArch64 is not
a array. So we need to generate the array reference for each feature
bits element in the dispatcher function.
* config/riscv/riscv.cc (add_condition_to_bb): New function.
(dispatch_function_versions): New function.
(get_suffixed_assembler_name): New function.
(make_resolver_func): New function.
(riscv_generate_version_dispatcher_body): New function.
(riscv_get_function_versions_dispatcher): New function.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): Implement it.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): Implement it.
Yangyu Chen [Tue, 5 Nov 2024 03:22:45 +0000 (11:22 +0800)]
RISC-V: Implement TARGET_COMPARE_VERSION_PRIORITY and TARGET_OPTION_FUNCTION_VERSIONS
This patch implements TARGET_COMPARE_VERSION_PRIORITY and
TARGET_OPTION_FUNCTION_VERSIONS for RISC-V.
The TARGET_COMPARE_VERSION_PRIORITY is implemented to compare the
priority of two function versions based on the rules defined in the
RISC-V C-API Doc PR #85:
If multiple versions have equal priority, we select the function with
the most number of feature bits generated by
riscv_minimal_hwprobe_feature_bits. When it comes to the same number of
feature bits, we diff two versions and select the one with the least
significant bit set. Since a feature appears earlier in the feature_bits
might be more important to performance.
The TARGET_OPTION_FUNCTION_VERSIONS is implemented to check whether the
two function versions are the same. This Implementation reuses the code
in TARGET_COMPARE_VERSION_PRIORITY and check it returns 0, which means
the equal priority.
* config/riscv/riscv.cc
(parse_features_for_version): New function.
(compare_fmv_features): New function.
(riscv_compare_version_priority): New function.
(riscv_common_function_versions): New function.
(TARGET_COMPARE_VERSION_PRIORITY): Implement it.
(TARGET_OPTION_FUNCTION_VERSIONS): Implement it.
This patch implements the TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P for
RISC-V. This hook is used to process attribute
((target_version ("..."))).
As it is the first patch which introduces the target_version attribute,
we also set TARGET_HAS_FMV_TARGET_ATTRIBUTE to 0 to use "target_version"
for function versioning.
* config/riscv/riscv-protos.h
(riscv_process_target_attr): Remove as it is not used.
(riscv_option_valid_version_attribute_p): Declare.
(riscv_process_target_version_attr): Declare.
* config/riscv/riscv-target-attr.cc
(riscv_target_attrs): Renamed from riscv_attributes.
(riscv_target_version_attrs): New attributes for target_version.
(riscv_process_one_target_attr): New arguments to select attrs.
(riscv_process_target_attr): Likewise.
(riscv_option_valid_attribute_p): Likewise.
(riscv_process_target_version_attr): New function.
(riscv_option_valid_version_attribute_p): New function.
* config/riscv/riscv.cc
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): Implement it.
* config/riscv/riscv.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): Define
it to 0 to use "target_version" for function versioning.
This patch implements the riscv_minimal_hwprobe_feature_bits feature
for the RISC-V target. The feature bits are defined in the
libgcc/config/riscv/feature_bits.c to provide bitmasks of ISA extensions
that defined in RISC-V C-API. Thus, we need a function to generate the
feature bits for IFUNC resolver to dispatch between different functions
based on the hardware features.
The minimal feature bits means to use the earliest extension appeard in
the Linux hwprobe to cover the given ISA string. To allow older kernels
without some implied extensions probe to run the FMV dispatcher
correctly.
For example, V implies Zve32x, but Zve32x appears in the Linux kernel
since v6.11. If we use isa string directly to generate FMV dispatcher
with functions with "arch=+v" extension, since we have V implied the
Zve32x, FMV dispatcher will check if the Zve32x extension is supported
by the host. If the Linux kernel is older than v6.11, the FMV dispatcher
will fail to detect the Zve32x extension even it already implies by the
V extension, thus making the FMV dispatcher fail to dispatch the correct
function.
Thus, we need to generate the minimal feature bits to cover the given
ISA string to allow the FMV dispatcher to work correctly on older
kernels.
* common/config/riscv/riscv-common.cc
(RISCV_EXT_BITMASK): New macro.
(struct riscv_ext_bitmask_table_t): New struct.
(riscv_minimal_hwprobe_feature_bits): New function.
* common/config/riscv/riscv-ext-bitmask.def: New file.
* config/riscv/riscv-subset.h (GCC_RISCV_SUBSET_H): Include
riscv-feature-bits.h.
(riscv_minimal_hwprobe_feature_bits): Declare the function.
* config/riscv/riscv-feature-bits.h: New file.
Yangyu Chen [Tue, 5 Nov 2024 03:22:00 +0000 (11:22 +0800)]
RISC-V: Implement Priority syntax parser for Function Multi-Versioning
This patch adds the priority syntax parser to support the Function
Multi-Versioning (FMV) feature in RISC-V. This feature allows users to
specify the priority of the function version in the attribute syntax.
Chnages based on RISC-V C-API PR:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85
Yangyu Chen [Tue, 5 Nov 2024 03:21:22 +0000 (11:21 +0800)]
Introduce TARGET_CLONES_ATTR_SEPARATOR for RISC-V
Some architectures may use ',' in the attribute string, but it is not
used as the separator for different targets. To avoid conflict, we
introduce a new macro TARGET_CLONES_ATTR_SEPARATOR to separate different
clones.
As an example, according to RISC-V C-API Specification [1], RISC-V allows
',' in the attribute string in the "arch=" option to specify one more
ISA extensions in the same target function, which conflict with the
default separator to separate different clones. This patch introduces
TARGET_CLONES_ATTR_SEPARATOR for RISC-V and choose '#' as the separator,
since '#' is not allowed in the target_clones option string.
Paul Thomas [Wed, 13 Nov 2024 08:57:55 +0000 (08:57 +0000)]
Fortran: Fix failing character pointer fcn assignment [PR105054]
2024-11-14 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/105054
* resolve.cc (get_temp_from_expr): If the pointer function has
a deferred character length, generate a new deferred charlen
for the temporary.
gcc/testsuite/
PR fortran/105054
* gfortran.dg/ptr_func_assign_6.f08: New test.
Jakub Jelinek [Wed, 13 Nov 2024 08:44:20 +0000 (09:44 +0100)]
c: Handle C23 floating constant {d,D}{32,64,128} suffixes like {df,dd,dl}
C23 roughly says that {d,D}{32,64,128} floating point constant suffixes
are alternate spellings of {df,dd,dl} suffixes in annex H.
So, the following patch allows that alternate spelling.
Or is it intentional it isn't enabled and we need to do everything in
there first before trying to define __STDC_IEC_60559_DFP__?
Like add support for _Decimal32x and _Decimal64x types (including
the d32x and d64x suffixes) etc.
2024-11-13 Jakub Jelinek <jakub@redhat.com>
libcpp/
* expr.cc (interpret_float_suffix): Handle d32 and D32 suffixes
for C like df, d64 and D64 like dd and d128 and D128 like
dl.
gcc/c-family/
* c-lex.cc (interpret_float): Subtract 3 or 4 from copylen
rather than 2 if last character of CPP_N_DFLOAT is a digit.
gcc/testsuite/
* gcc.dg/dfp/c11-constants-3.c: New test.
* gcc.dg/dfp/c11-constants-4.c: New test.
* gcc.dg/dfp/c23-constants-3.c: New test.
* gcc.dg/dfp/c23-constants-4.c: New test.
The following patch implements the C2Y N3298 paper Introduce complex literals
by providing different (or no) diagnostics on imaginary constants (except
for integer ones).
For _DecimalN constants we don't support _Complex _DecimalN and error on any
i/j suffixes mixed with DD/DL/DF, so nothing changed there.
2024-11-13 Jakub Jelinek <jakub@redhat.com>
PR c/117029
libcpp/
* include/cpplib.h (struct cpp_options): Add imaginary_constants
member.
* init.cc (struct lang_flags): Add imaginary_constants bitfield.
(lang_defaults): Add column for imaginary_constants.
(cpp_set_lang): Copy over imaginary_constants.
* expr.cc (cpp_classify_number): Diagnose CPP_N_IMAGINARY
non-CPP_N_FLOATING constants differently for C.
gcc/testsuite/
* gcc.dg/cpp/pr7263-3.c: Adjust expected diagnostic wording.
* gcc.dg/c23-imaginary-constants-1.c: New test.
* gcc.dg/c23-imaginary-constants-2.c: New test.
* gcc.dg/c23-imaginary-constants-3.c: New test.
* gcc.dg/c23-imaginary-constants-4.c: New test.
* gcc.dg/c23-imaginary-constants-5.c: New test.
* gcc.dg/c23-imaginary-constants-6.c: New test.
* gcc.dg/c23-imaginary-constants-7.c: New test.
* gcc.dg/c23-imaginary-constants-8.c: New test.
* gcc.dg/c23-imaginary-constants-9.c: New test.
* gcc.dg/c23-imaginary-constants-10.c: New test.
* gcc.dg/c2y-imaginary-constants-1.c: New test.
* gcc.dg/c2y-imaginary-constants-2.c: New test.
* gcc.dg/c2y-imaginary-constants-3.c: New test.
* gcc.dg/c2y-imaginary-constants-4.c: New test.
* gcc.dg/c2y-imaginary-constants-5.c: New test.
* gcc.dg/c2y-imaginary-constants-6.c: New test.
* gcc.dg/c2y-imaginary-constants-7.c: New test.
* gcc.dg/c2y-imaginary-constants-8.c: New test.
* gcc.dg/c2y-imaginary-constants-9.c: New test.
* gcc.dg/c2y-imaginary-constants-10.c: New test.
* gcc.dg/c2y-imaginary-constants-11.c: New test.
* gcc.dg/c2y-imaginary-constants-12.c: New test.