Alpha: Also use tree information to get base block alignment
We hardly ever emit code using machine instructions for aligned memory
accesses for block move and clear operation and the reason for this
appears to be that suboptimal alignment is often passed by the caller
and then we only try to find a better alignment by checking pseudo
register pointer alignment information, and from observation it's most
often only set for stack frame references.
This code originates from before Tree SSA days and we can do better
nowadays, by looking up the original tree node associated with a MEM
RTL, so implement this approach, factoring out repeating code from
`alpha_expand_block_move' and `alpha_expand_block_clear' to a new
function.
In some cases howewer tree information is not available while pointer
alignment is, such as with the case concerned with PR target/115459,
where we have:
showing no tree information and the alignment of 8 only for `orig_src',
while indeed REGNO_POINTER_ALIGN returns 128 for pseudo 65. So retain
the old approach and return the largest alignment determined and its
associated offset.
Add test cases accordingly and remove XFAILs from memclr-a2-o1-c9-ptr.c
now that it does get aligned code produced now.
gcc/
* config/alpha/alpha.cc
(alpha_get_mem_rtx_alignment_and_offset): New function.
(alpha_expand_block_move, alpha_expand_block_clear): Use it for
alignment retrieval.
gcc/testsuite/
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Remove XFAILs.
* gcc.target/alpha/memcpy-di-aligned.c: New file.
* gcc.target/alpha/memcpy-di-unaligned.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-src.c: New file.
Alpha: Fix offset adjustment in unaligned access helpers
Correct the offset adjustment made in the multi-word unaligned access
helpers such that it is actually used by the unaligned load and store
instructions, fixing a bug introduced with commit 1eb356b98df2 ("alpha
gprel optimizations")[1] back in 2001, which replaced address changes
made directly according to the argument of the MEM expression passed
with one made according to an address previously extracted from said MEM
expression. The address is however incorrectly extracted from said MEM
before an adjustment has been made to it for the offset supplied.
This bug is usually covered by the fact that our block move and clear
operations are hardly ever provided with correct block alignment data
and we also usually fail to fetch that information from the MEM supplied
(although PR target/115459 shows it does happen sometimes). Instead the
bit alignment of 8 is usually conservatively used, meaning that a zero
offset is passed to `alpha_expand_unaligned_store_words' and then code
has been written such that neither `alpha_expand_unaligned_load_words'
nor `alpha_expand_unaligned_store_words' cannot ever be called with
nonzero offset from `alpha_expand_block_move'.
The only situation where `alpha_expand_unaligned_store_words' can be
called with nonzero offset is from `alpha_expand_block_clear' with a BWX
target for a misaligned block that has been embedded in a data object of
a higher alignment such that there is a small unaligned prefix our code
decides to handle so as to align further stores.
For instance it happens when a block clear is called for a block of 9
bytes embedded at offset 1 in a structure aligned to a 2-byte word, as
illustrated by the test case included. Now this test case does not work
without the change that comes next applied, because the backend cannot
see the word alignment of the struct and uses the bit alignment of 8
instead.
Should this change be swapped with the next one incorrect code such as:
would be produced, where the unadjusted offsets of 1/8 can be seen with
the LDQ_U/STQ_U operations along with byte masks calculated accordingly
rather than the expected offsets of 2/9. As a result the byte at the
offset of 9 fails to get cleared. In these circumstances this would
also show as execution failures with the memclr.c test:
FAIL: gcc.c-torture/execute/memclr.c -O1 execution test
FAIL: gcc.c-torture/execute/memclr.c -Os execution test
-- not at `-O0' though, as the higher alignment cannot be retrieved in
that case, and then not at `-O2' or higher optimization levels either,
because then we choose to open-code this block clear instead:
ldbu $1,0($16)
stw $31,8($16)
stq $1,0($16)
avoiding the bug in `alpha_expand_unaligned_store_words'.
I am leaving the pattern match test case XFAIL-ed here for documentation
purposes and it will be un-XFAIL-ed along with the fix to retrieve the
correct alignment. The run test is of course never expected to fail.
gcc/
* config/alpha/alpha.cc (alpha_expand_unaligned_load_words):
Move address extraction until after the MEM referred has been
adjusted for the offset supplied.
(alpha_expand_unaligned_store_words): Likewise.
gcc/testsuite/
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: New file.
* gcc.target/alpha/memclr-a2-o1-c9-run.c: New file.
Alpha: Adjust MEM alignment for block clear [PR115459]
By inference it appears to me that the same fix for PR target/115459
needs to be applied to the block clear operation that has been done for
block move, as implemented by commit ccfe71518039 ("[alpha] adjust MEM
alignment for block move [PR115459]").
gcc/
PR target/115459
* config/alpha/alpha.cc (alpha_expand_block_clear): Adjust MEM
to match inferred alignment.
Alpha: Remove code duplication in block clear trailer
Remove code duplication in the part of `alpha_expand_block_clear' that
handles any aligned trailing part of the block, observing that the two
legs of code only differ by the machine mode and that we already take
the same approach with handling any unaligned prefix earlier on. No
functional change, just code shuffling.
gcc/
* config/alpha/alpha.cc (alpha_expand_block_clear): Fold two
legs of a conditional together.
Alpha: Permit constant zero source for "insvmisaligndi"
Eliminate a redundant bitwise inclusive OR operation on the insertion of
constant zero into a bit-field, improving code produced at `-O2' from an
output sequence such as:
for a quadword unaligned store operation. As shown in the example this
only triggers for the high-part store (and therefore only for 2-byte,
4-byte, and 8-byte stores), because `insXl' insns are fully expressed in
terms of RTL and therefore the insertion of zero is eliminated in later
RTL passes, however corresponding `insXh' insns are unspecs only, making
them impossible to see through.
We can get this optimal right from expand though, given that our handler
for "insvmisaligndi", i.e. `alpha_expand_unaligned_store', has explicit
provisions for `const0_rtx' source.
gcc/
* config/alpha/alpha.md (insvmisaligndi): Use "reg_or_0_operand"
rather than "register_operand" for operand 3.
gcc/testsuite/
* gcc.target/alpha/stlx0.c: New file.
* gcc.target/alpha/stqx0.c: New file.
* gcc.target/alpha/stwx0.c: New file.
* gcc.target/alpha/stwx0-bwx.c: New file.
testsuite: Expand coverage for unaligned memory stores
Expand coverage for unaligned memory stores, for the "insvmisalignM"
patterns, for 2-byte, 4-byte, and 8-byte scalars, across byte alignments
of 1, 2, 4 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test case), making
sure all data is written and no data is changed outside the area meant
to be written.
The test case has turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded this
test qualifies for generic verification and does not have to be limited
to the Alpha-specific subset of the testsuite.
gcc/testsuite/
* gcc.c-torture/execute/misalign.c: New file.
testsuite: Expand coverage for `__builtin_memset' with 0
Expand coverage for `__builtin_memset' for the special case of clearing
a block, primarily for "setmemM" block set pattern, though with smaller
sizes open-coded sequences may be produced instead.
This verifies block sizes in bytes from 1 to 64 across byte alignments
of 1, 2, 4, 8 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test case), making
sure all the intended area is cleared and no data is changed outside it.
These choice of the ranges for the parameters has come from the Alpha
backend, whose "setmemM" pattern has various corner cases related to
base alignment and the misalignment within.
The test case has turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded this
test qualifies for generic verification and does not have to be limited
to the Alpha-specific subset of the testsuite.
Just as with `__builtin_memcpy' tests this code turned out to require
quite a lot of time to compile, although a bit less than the former.
Example compilation times with reasonably fast POWER9@2.166GHz at `-O2'
optimization and GCC built at `-O2' for various targets:
Alpha/testsuite: Run target testing over all the usual optimization levels
Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
Alpha testsuite as several targets already do. Add `-Og -g' and `-Oz'
as well via ADDITIONAL_TORTURE_OPTIONS to expand coverage. Adjust test
options across individual test cases accordingly where required.
Discard base-2.c, cix-2.c, and max-2.c test cases as they merely are
optimization variants of base-1.c, cix-1.c, and max-1.c respectively,
run at `-O2' rather than the default level (`-O0'), now covered by the
framework with the latter ones in a generic way.
The hook changes the allocno class to either FP_REGS or GR_REGS depending on
the mode of the register. This results in better register allocation overall,
fewer spills and reduced codesize - particularly in SPEC2017 lbm.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_ira_change_pseudo_allocno_class): New function.
(TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define macro.
Lewis Hyatt [Tue, 22 Oct 2024 19:23:40 +0000 (15:23 -0400)]
libcpp: Fix overly large buffer allocation
It seems that tokens_buff_new() has always been allocating the virtual
location buffer 4 times larger than intended, and now that location_t is
64-bit, it is 8 times larger. Fixed.
libcpp/ChangeLog:
* macro.cc (tokens_buff_new): Fix length argument to XNEWVEC.
testsuite/gcc.dg/memcmp-1.c: Cut down a factor of 7 for simulators
Running tests in parallel on my 4.5y+ old laptop made this
test time out: the test itself runs in 9m20s, the timeout
being 10 minutes with the 2x factor. That's a bit too close.
This commit does to the base test a similar change as was
done for gcc.dg/torture/inline-mem-cpy-1.c in commit r14-8188-g6eca0d23b7ea84; or IOW cut it down a factor of 7
(r14-8188 was by a factor of 11).
* gcc.dg/memcmp-1.c: Pass -DRUN_FRACTION=7 when testing in a simulator.
libgfortran: Fix build for targets with int32_t=long int
Without this, after r15-6415-g586477d67bf2e3, you'll see,
for targets where int32_t is a typedef of long int (beware
of artificially broken lines):
/x/gcc/libgfortran/caf/single.c: In function '_gfortran_caf_get_by_ct':
/x/gcc/libgfortran/caf/single.c:2943:56: error: passing argument 2 of '\
(accessor_hash_table + (sizetype)((unsigned int)getter_index * 12))->ac\
cessor' from incompatible pointer type [-Wincompatible-pointer-types]
2943 | accessor_hash_table[getter_index].accessor (dst_ptr, &free_bu\
ffer, src_ptr,
| ^~~~~~~~\
~~~~
| |
| int *
/x/gcc/libgfortran/caf/single.c:2943:56: note: expected 'int32_t *' {ak\
a 'long int *'} but argument is of type 'int *'
libgfortran:
* caf/single.c (_gfortran_caf_get_by_ct): Correct type of free_buffer
to int32_t.
Harald Anlauf [Mon, 23 Dec 2024 16:56:46 +0000 (17:56 +0100)]
Fortran: fix NULL without MOLD argument to scalar DT pointer dummy [PR118179]
Commit r15-6408 overlooked the case of passing NULL without MOLD argument
to a derived type pointer dummy argument without specified intent. Since
it is prohibited to modify the dummy argument, we treat it as if intent(in)
were specified and suppress copying back of the pointer address.
PR fortran/118179
gcc/fortran/ChangeLog:
* trans-expr.cc (conv_null_actual): Suppress copying back of
pointer address for unspecified intent.
gcc/testsuite/ChangeLog:
* gfortran.dg/null_actual_7.f90: Extend testcase to also cover
scalar variants with pointer or allocatable dummy with or without
specified intent.
Simon Martin [Mon, 23 Dec 2024 12:28:31 +0000 (13:28 +0100)]
libcc1: Fix tags generation target
'make tags' currently fails for libcc1 with this:
*** No rule to make target `marshall-c.hh', needed by `tags-am'. Stop.
The problem is that while marshall-c.hh has been removed via r12-454-g25d1a6ecdc443f, it's still part of the libcc1_la_SOURCES
variable, hence the 'tags' target has a dependency on it.
This patch simply removes the marshall_c_source variable, that should be
empty.
libcc1/ChangeLog:
* Makefile.am: Remove reference to deleted marshall-c.h.
* Makefile.in: Regenerate.
Paul Thomas [Mon, 23 Dec 2024 15:32:40 +0000 (15:32 +0000)]
Fortran: Bugs found in class_transformational_1/2.f90[PR116254/118059].
2024-12-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran/ChangeLog
PR fortran/116254
* trans-array.cc (gfc_trans_create_temp_array): Make sure that
transformational intrinsics of class objects that change rank,
most particularly spread, go through the correct code path. Re-
factor so that changes to the dtype are done on the temporary
before the class data of the result points to it.
PR fortran/118059
* trans-expr.cc (arrayfunc_assign_needs_temporary): Character
array function expressions assigned to an unlimited polymorphic
variable require a temporary.
Recently two test cases for PR118149 have been added.
While pr118149-2.c works well for AArch64, pr118149.c fails
because the expected optimization in forwprop4 cannot be applied
as SLP vectorization does not happen.
This patch fixes this issue by disabling the check on AArch64.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr118149.c: Disable for AArch64.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Wilken Gottwalt [Sun, 8 Dec 2024 19:46:16 +0000 (19:46 +0000)]
gm2: fix bad programming practice warning
Fix identifier names to be too similar to Modula-2 keywords and causing
warnings coming from Modula-2's own libraries.
m2/m2log/InOut.mod:51:18: note: In implementation module ‘InOut’:
either the identifier has the same name as a keyword or alternatively a
keyword has the wrong case (‘IN’ and ‘in’)
51 | in, out: File ;
m2/m2log/InOut.mod:51:18: note: the symbol name ‘in’ is legal as an
identifier, however as such it might cause confusion and is considered
bad programming practice
gcc/gm2:
* gm2-libs-log/InOut.mod: Fix bad identifier warning.
testsuite: arm: Check for short circuit instructions [PR103298]
Instead of checking that a certain transformation is not used by
counting the number of return instructions and the number of BEQ
instructions, check that none of CMP, MOV, ORR and AND instructions are
suffixed with EQ or NE.
Also removed size check as it's very unstable (depends on optimization
in use).
gcc/testsuite/ChangeLog:
PR testsuite/103298
* gcc.target/arm/pr43920-2.c: Change to assembler pattern
"(cmp|mov|orr|and)(eq|ne)" for the check. Remove size check.
Fortran: Replace getting of coarray data with accessor-based version. [PR107635]
Getting coarray data from remote images was slow, inefficient and did
not work for object files that where not compiled with coarray support
for derived types with allocatable/pointer components. The old approach
emulated accessing data through a whole structure ref, which was error
prone for corner cases. Furthermore was did it have a runtime
complexity of O(N), where N is the number of allocatable/pointer
components and descriptors involved. Each of those needed communication
twice. The new approach creates a routine for each access into a
coarray object putting all required operations there. Looking a
tree-dump one will see those small routines. But this time it is just
compiled fortran with all the knowledge of the compiler of bounds and so
on. New paradigms will be available out of the box. Furthermore is the
complexity of the communication reduced to be O(1). E.g. the mpi
implementation sends one message for the parameters of the access and
one message back with the results without caring about the number of
allocatable/pointer/descriptor components in the access.
Identification of access routines is done be adding them to a hash map,
where the hash is the same on all images. Translating the hash to an
index, which is the same on all images again, allows for fast calls of
the access routines. Resolving the hash to an index is cached at
runtime, preventing additional hash map lookups. A hashmap was use
because not all processor OS combinations may use the same address for
the access routine.
gcc/fortran/ChangeLog:
PR fortran/107635
* gfortran.h (gfc_add_caf_accessor): New function.
* gfortran.texi: Document new API routines.
* resolve.cc (get_arrayspec_from_expr): Synthesize the arrayspec
resulting from an expression, i.e. not only the rank, but also
the bounds.
(remove_coarray_from_derived_type): Remove coarray ref from a
derived type to access it in access routine.
(convert_coarray_class_to_derived_type): Same but for classes.
The result is a derived type.
(split_expr_at_caf_ref): Split an expression at the coarray
reference to move the reference after the coarray ref into the
access routine.
(check_add_new_component): Helper to add variables as
components to derived type transfered to the access routine.
(create_get_parameter_type): Create the derived type to transfer
addressing data to the access routine.
(create_get_callback): Create the access routine.
(add_caf_get_intrinsic): Use access routine instead of old
caf_get.
* trans-decl.cc (gfc_build_builtin_function_decls): Register new
API routines.
(gfc_create_module_variable): Use renamed flag.
(gfc_emit_parameter_debug_info):
(struct caf_accessor): Linked list of hash-access routine pairs.
(gfc_add_caf_accessor): Add a hash-access routine pair to above
linked list.
(create_caf_accessor_register): Add all registered hash-access
routine pairs to the current caf_init.
(generate_coarray_init): Use routine above.
(gfc_generate_module_vars): Use renamed flag.
(generate_local_decl): Same.
(gfc_generate_function_code): Same.
(gfc_process_block_locals): Same.
* trans-intrinsic.cc (conv_shape_to_cst): Build the product of a
shape.
(gfc_conv_intrinsic_caf_get): Create call to access routine.
(conv_caf_send): Adapt to caf_get using less arguments.
(gfc_conv_intrinsic_function): Same.
* trans.cc (gfc_trans_force_lval): Helper to ensure that an
expression can be used as an lvalue-ref.
* trans.h (gfc_trans_force_lval): See above.
libgfortran/ChangeLog:
* caf/libcaf.h (_gfortran_caf_register_accessor): New function
to register access routines at runtime.
(_gfortran_caf_register_accessors_finish): New function to
finish registration of access routine and sort hash map.
(_gfortran_caf_get_remote_function_index): New function to
convert an hash to an index.
(_gfortran_caf_get_by_ct): New function to get data from a
remote image using the access routine given by an index.
* caf/single.c (struct accessor_hash_t): Hashmap type.
(_gfortran_caf_send): Fixed formatting.
(_gfortran_caf_register_accessor): Register a hash accessor
routine.
(hash_compare): Compare two hashes for sort() and bsearch().
(_gfortran_caf_register_accessors_finish): Sort the hashmap to
allow bsearch()'s quick lookup.
(_gfortran_caf_get_remote_function_index): Map a hash to an
index.
(_gfortran_caf_get_by_ct): Get data from a remote image using
the index provided by get_remote_function_index().
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray_atomic_5.f90: Adapted to look for
get_by_ct.
* gfortran.dg/coarray_lib_comm_1.f90: Same.
* gfortran.dg/coarray_stat_function.f90: Same.
Fortran: Remove adding and removing of caf_get. [PR107635]
Preparatory work for PR107635.
During resolve prevent adding caf_get calls for expressions on the
left-hand-side of an assignment and removing them later on again.
Furthermore has the caf_token in a component become a pointer to
the component and not the backend_decl of the caf-component.
In some cases the caf_token was added as last component in a derived
type and not as the next one following the component that it was
needed to be associated to.
gcc/fortran/ChangeLog:
PR fortran/107635
* gfortran.h (gfc_comp_caf_token): Convenient macro for
accessing caf_token's tree.
* resolve.cc (gfc_resolve_ref): Backup caf_lhs when resolving
expr in array_ref.
(remove_caf_get_intrinsic): Removed.
(resolve_variable): Set flag caf_lhs when resolving lhs of
assignment to prevent insertion of caf_get.
(resolve_lock_unlock_event): Same, but the lhs is the parameter.
(resolve_ordinary_assign): Move conversion to caf_send to
resolve_codes.
(resolve_codes): Adress caf_get and caf_send here.
(resolve_fl_derived0): Set component's caf_token when token is
necessary.
* trans-array.cc (gfc_conv_array_parameter): Get a coarray for
expression that have a corank.
(structure_alloc_comps): Use macro to get caf_token's tree.
(gfc_alloc_allocatable_for_assignment): Same.
* trans-expr.cc (gfc_get_ultimate_alloc_ptr_comps_caf_token):
Same.
(gfc_trans_structure_assign): Same.
* trans-intrinsic.cc (conv_expr_ref_to_caf_ref): Same.
(has_ref_after_cafref): New function to figure that after a
reference of a coarray another reference is present.
(conv_caf_send): Get rhs from correct place, when caf_get is
not removed.
* trans-types.cc (gfc_get_derived_type): Get caf_token from
component and no longer guessing.
Arsen Arsenović [Thu, 1 Aug 2024 15:38:15 +0000 (17:38 +0200)]
warn-access: ignore template parameters when matching operator new/delete [PR109224]
Template parameters on a member operator new cannot affect its member
status nor whether it is a singleton or array operator new, hence, we
can ignore it for purposes of matching. Similar logic applies to the
placement operator delete.
In the PR (and a lot of idiomatic coroutine code generally), operator
new is templated in order to be able to inspect (some of) the arguments
passed to the coroutine, to make allocation-related decisions. However,
the coroutine implementation will not call a placement delete form, so
it cannot get templated. As a result, when demangling, we have an extra
template DEMANGLE_COMPONENT_TEMPLATE around the actual operator new, but
not operator delete. This terminates new_delete_mismatch_p early.
PR middle-end/109224 - Wmismatched-new-delete false positive with a templated operator new (common with coroutines)
gcc/ChangeLog:
PR middle-end/109224
* gimple-ssa-warn-access.cc (new_delete_mismatch_p): Strip
DEMANGLE_COMPONENT_TEMPLATE from the operator new and operator
after demangling.
gcc/testsuite/ChangeLog:
PR middle-end/109224
* g++.dg/warn/Wmismatched-new-delete-9.C: New test.
Harald Anlauf [Sat, 14 Dec 2024 19:26:47 +0000 (20:26 +0100)]
Fortran: fix passing of NULL() to assumed-rank, derived type dummy [PR104819]
PR fortran/104819
gcc/fortran/ChangeLog:
* interface.cc (compare_parameter): For the rank check, NULL()
inherits the rank of a provided MOLD argument.
(gfc_compare_actual_formal): Adjust check of NULL() actual argument
against formal to accept F2008 enhancements (allocatable dummy).
NULL() with MOLD argument retains a pointer/allocatable attribute.
* trans-expr.cc (conv_null_actual): Implement passing NULL() to
derived-type dummy with pointer/allocatable attribute, and ensure
that the actual rank is passed to an assumed-rank dummy.
(gfc_conv_procedure_call): Use it.
Jeff Law [Sat, 21 Dec 2024 15:33:36 +0000 (08:33 -0700)]
[RISC-V][PR middle-end/118084] Fix brev based reflection code
The fuzzer tripped over a risc-v target issue in the expansion of CRCs.
In particular we want to use brev instruction to improve the reflection
code.
In the case where the item to be reflected is smaller than a word we
would end up triggering an ICE due to mode mismatching since the
expansion code asks for the operation in word_mode.
I was briefly confused by the multiple calls into this code, but we have
to reflect multiple values and those calls may be reflecting different
sized items. So seeing one in SI, then another in QI is sensible.
The fix is pretty simple. In theory the item being reflected should
always be word_size or smaller. So an assertion is added to verify
that. If the item's size is smaller than a word, we can use a
paradoxical subreg. The logical right shift after the brev should zero
out any extraneous bits.
It's unclear why we're passing a pointer to an RTX in this code. I left
that as-is, but we can simplify the code a little bit by doing the
dereference early and using the dereferenced value.
Pan Li [Tue, 10 Dec 2024 06:27:53 +0000 (14:27 +0800)]
Match: Refactor the signed SAT_ADD match patterns [NFC]
This patch would like to refactor the all signed SAT_ADD patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of signed SAT_ADD match patterns.
Signed-off-by: Pan Li <pan2.li@intel.com> Signed-off-by: Pan Li <pan2.li@intel.com>
Mark Harmstone [Fri, 20 Dec 2024 02:29:21 +0000 (02:29 +0000)]
Fix compilation error in vmsdbgout_begin_block on VMS targets
Commit 4ed189854eae ("Add block parameter to begin_block debug hook") changed
the definition of the begin_block function pointer to add another parameter,
but I missed a call in vmsdbgout_begin_block.
Sandra Loosemore [Fri, 20 Dec 2024 16:27:14 +0000 (16:27 +0000)]
Fortran: Fix hyphenation errors in the manual
When looking through the gfortran manual, I noted some problems with
hyphens being used where they're not correct or necessary,
e.g. "non-standard" vs "nonstandard", "null-pointer" vs "null pointer"
(as a noun), etc. I've made a pass through the documentation to
correct at least some of those uses.
gcc/fortran/ChangeLog
* gfortran.texi: Get rid of some unnecessary hyphens throughout
the file.
* invoke.texi: Likewise.
Sandra Loosemore [Fri, 20 Dec 2024 05:08:15 +0000 (05:08 +0000)]
Fortran: Use the present tense for the manual.
The present tense is preferred for expressing facts or enduring
behavior. Thus we should say "option X does Y" instead of "option X
will do Y", reserving the future tense for things that happen at some
later time (such as in a future release of GCC, or at run time as
explicitly contrasted with compile time).
This set of edits is largely mechanical substitution of phrasing
involving "will". I also fixed a few more markup problems noted while
editing nearby text and fixed a few instances of awkward wording.
gcc/fortran/ChangeLog
* gfortran.texi: Use the present tense throughout; fix some
markup issues and awkward wording.
* invoke.texi: Likewise.
Sandra Loosemore [Thu, 19 Dec 2024 23:49:57 +0000 (23:49 +0000)]
Fortran: Fixes for markup, typos, and indexing in manual
While working on something else I noticed there were numerous places
in the GNU Fortran manual with incorrect/missing Texinfo markup. I
made a pass through about the first third of the manual (not yet the
coarray API or intrinsics documentation) to fix at least some of those
issues, plus some typos and missing @cindex entries.
There shouldn't be any semantic changes to the documentation in this patch.
gcc/fortran/ChangeLog
* gfortran.texi: Fix markup, typos, and indexing throughout the
file.
* invoke.texi: Likewise.
Sandra Loosemore [Thu, 19 Dec 2024 00:43:11 +0000 (00:43 +0000)]
Fortran: Clean up -funderscoring and -fsecond-underscore docs [PR51820]
This is a long-standing documentation bug in the Fortran manual,
initially reported in 2012 as PR51820, with a quick fix applied later
for PR109216. The patch here incorporates more of the discussion from
the original issue.
gcc/fortran/ChangeLog
PR fortran/51820
PR fortran/89632
PR fortran/109216
* invoke.texi (Code Gen Options): Further cleanups of the discussion
of what -funderscoring and -fsecond-underscore do.
Alexandre Oliva [Fri, 20 Dec 2024 21:02:01 +0000 (18:02 -0300)]
strub: accept indirection of volatile pointer types [PR118007]
We don't want to indirect pointers in strub wrappers, because it
generally isn't profitable, but if the argument is volatile, then we
must use indirection to preserve access patterns, so amend the
assertion check.
Alexandre Oliva [Fri, 20 Dec 2024 21:01:53 +0000 (18:01 -0300)]
avoid trying to set block in barriers [PR113506]
When we emit a sequence before a preexisting insn and naming a BB to
store in the insns, we will attempt to store the BB even in barriers
present in the sequence.
Barriers don't expect blocks, and rtl checking catches the problem.
When emitting after a preexisting insn, we skip the block setting in
barriers. Change the before emitter to do so as well.
for gcc/ChangeLog
PR middle-end/113506
* emit-rtl.cc (add_insn_before): Don't set the block of a
barrier.
testsuite: tree-ssa: Fix i686/-m32 fails for vector-*.c tests
FAILs have been reported for several tree-ssa vector-*.c tests
on i686-linux or on x86_64-linux with -m32.
This patch addresses these fails by setting the necessary -msse2 flags.
This patch also streamlines all tests to use dg-options instead
of dg-additional-options. This is in line with most other tests
in gcc.dg/tree-ssa.
Tested with the following board config in RUNTESTFLAGS:
--target_board=unix\{-m64,-m32,-m32/-mno-mmx/-mno-sse}
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/satd-hadamard.c: Rename dg-additional-options
to dg-options.
* gcc.dg/tree-ssa/vector-10.c: Rename dg-additional-options
to dg-options and add -msse2 to it.
* gcc.dg/tree-ssa/vector-11.c: Likewise.
* gcc.dg/tree-ssa/vector-8.c: Rename dg-additional-options
to dg-options.
* gcc.dg/tree-ssa/vector-9.c: Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
A recent bugfix (eee2891312) for PR117830 also addressed PR118149.
This patch adds two test cases for PR118149.
These tests are different than other tests in that one of the
vec-perm selectors contains indices in descending order (1, 1, 0, 0),
which is the root cause for the ICE observed in PR118149.
PR118149
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr118149-2.c: New test.
* gcc.dg/tree-ssa/pr118149.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Harald Anlauf [Thu, 19 Dec 2024 21:22:52 +0000 (22:22 +0100)]
Fortran: potential aliasing of complex pointer inquiry references [PR118120]
PR fortran/118120
PR fortran/113928
gcc/fortran/ChangeLog:
* trans-array.cc (symbols_could_alias): If one symbol refers to a
complex type and the other to a real type of the same kind, do not
a priori exclude the possibility of aliasing.
gcc/testsuite/ChangeLog:
* gfortran.dg/aliasing_complex_pointer.f90: New test.
Uros Bizjak [Fri, 20 Dec 2024 15:16:15 +0000 (16:16 +0100)]
i386: Disable SImode/DImode moves from/to mask regs without avx512bw [PR118067]
SImode and DImode moves from/to mask registers are valid only with AVX512BW,
so mark relevant alternatives in *movsi_internal and *movdi_internal as such.
Even with the patch, the testcase still fails, but now with:
pr118067.c: In function ‘foo’:
pr118067.c:13:1: internal compiler error: maximum number of generated reload insns per insn achieved (90)
13 | }
| ^
0x2c3b581 internal_error(char const*, ...)
../../git/gcc/gcc/diagnostic-global-context.cc:517
0xb68938 lra_constraints(bool)
../../git/gcc/gcc/lra-constraints.cc:5411
0xb51a0d lra(_IO_FILE*, int)
../../git/gcc/gcc/lra.cc:2449
0xaf9f4d do_reload
../../git/gcc/gcc/ira.cc:5977
0xafa462 execute
../../git/gcc/gcc/ira.cc:6165
Tamar Christina [Fri, 20 Dec 2024 14:34:32 +0000 (14:34 +0000)]
AArch64: Implement vector concat of partial SVE vectors [PR96342]
This patch adds support for vector constructor from two partial SVE vectors into
a full SVE vector. It also implements support for the standard vec_init obtab to
do this.
Tamar Christina [Fri, 20 Dec 2024 14:27:25 +0000 (14:27 +0000)]
AArch64: Add SVE support for simd clones [PR96342]
This patch finalizes adding support for the generation of SVE simd clones when
no simdlen is provided, following the ABI rules where the widest data type
determines the minimum amount of elements in a length agnostic vector.
gcc/ChangeLog:
PR target/96342
* config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare.
* config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
visibility global and support use for non_acle types.
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone
when no simdlen is provided, according to ABI rules.
(simd_clone_adjust_sve_vector_type): New helper function.
(aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones
and modify types to use SVE types.
* omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
(simd_clone_adjust): Adapt safelen check to be compatible with VLA
simdlen.
gcc/testsuite/ChangeLog:
PR target/96342
* gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
* gcc.target/aarch64/vect-simd-clone-1.c: New test.
* g++.target/aarch64/vect-simd-clone-1.C: New test.
Co-authored-by: Victor Do Nascimento <victor.donascimento@arm.com> Co-authored-by: Tamar Christina <tamar.christina@arm.com>
Christophe Lyon [Thu, 19 Dec 2024 16:25:59 +0000 (16:25 +0000)]
arm: [MVE intrinsics] Fix moves of tuples (PR target/118131)
Commit r15-6245-g4f4e13dd235b introduced new modes for MVE tuples, but
missed adding support for them in a few places.
Adding them to the list in arm_attr_length_move_neon is not sufficient
since we later face another ICE where the compiler does not know how
to split move of such data.
The patch therefore enhances the define_splits for OI and XI moves in
neon.md, via the introduction of new iterators.
In addition, it seems consistent to update output_move_neon such that
VALID_NEON_*_MODE are used only when TARGET_NEON.
gcc/ChangeLog:
PR target/118131
* config/arm/arm.cc (output_move_neon): Check TARGET_NEON as
needed.
(arm_attr_length_move_neon): Add support for V2x and V4x MVE tuple
modes.
* config/arm/iterators.md (VSTRUCT2, VSTRUCT4): New.
* config/arm/neon.md: Use VSTRUCT2 instead of OI and VSTRUCT4
instead of XI in define_split.
Pan Li [Fri, 20 Dec 2024 01:11:20 +0000 (09:11 +0800)]
RISC-V: Refine strided load/store testcase dump check to tree optimized
Like the sat alu related testcase, the dump check of strided load/store
takes the rtl dump for the standard name MASK_LEN_STRIDED_LOAD for times.
But the rtl pass expand is somehow mutable by the middle-end change or
debug information.
After that we need to adjust the dump check time and again. This
patch would like to switch to tree optimized pass for the standard
name check, which is more stable up to a point.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f16.c: Take
tree-optimized pass for standard name check, and adjust the times.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i16.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i8.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u32.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: Ditto
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u8.c: Ditto
forwprop: Fix lane handling for VEC_PERM sequence blending
In PR117830 a miscompilation of 464.h264ref was reported.
An analysis showed that wrong code was generated because of
unsatisfied assumptions. This patch addresses these issues.
The first assumption was that we could independently analyze the two
vec-perms at the start of a vec-perm-simplify sequence and use the
information later for calculating a final vec-perm selector that
utilizes fewer lanes. However, this information does not help much,
because for changing the selector entry, we need to ensure that both
elements of the operand vectors v_1 and v_2 remain equal.
This is addressed by removing the function get_vect_selector_index_map
and checking for this equality in the loop where we create the new
selector.
The calculation of the selector vector for the blended sequence
assumed that the indices of the selector vector of the narrowed
sequences are increasing. This assumption does not hold in general.
This was fixed by allowing a wrap-around when searching for an empty
lane.
Further, there was an issue in the calculation of the selector vector
entries for the second sequence. The code did not consider that the
lanes of the second sequence could have been moved.
A relevant property of this patch is that it introduces a
couple of nested loops, where the out loop iterates from
i=0..nelts and the inner loop iterates from j=0..i.
To avoid performance concerns, a check is introduced that
ensures nelts won't exceed 4 lanes.
The added test case is derived from h264ref (the other cases from the
benchmark have the same structure and don't provide additional coverage).
Bootstrapped and regression-tested on x86-64 and aarch64.
Further, tested on CPU 2006 h264ref and CPU 2017 x264.
PR117830
gcc/ChangeLog:
* tree-ssa-forwprop.cc (get_vect_selector_index_map): Removed.
(recognise_vec_perm_simplify_seq): Fix calculation of vec-perm
selectors of narrowed sequence.
(calc_perm_vec_perm_simplify_seqs): Fixing calculation of
vec-perm selectors of the blended sequence.
(process_vec_perm_simplify_seq_list): Add whitespace to dump
string to avoid bad formatted dump output.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/vector-11.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
This patch ensures that the list of valid -mtune options
does not contain entries more than once.
The -mtune option accepts CPU identifiers as well as
tuning identifiers and there are cases where a CPU and
its tuning have the same identifier.
PR116347
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_get_valid_option_values):
Skip adding mtune entries that are already in the list.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Jakub Jelinek [Fri, 20 Dec 2024 09:17:56 +0000 (10:17 +0100)]
c++: Fix up maybe_unused attribute handling [PR110345]
When adding test coverage for maybe_unused attribute, I've run into
several things:
1) similarly to deprecated attribute, the attribute shouldn't pedantically
appertain to types other than class/enumeration definitions
2) similarly to deprecated attribute, the attribute shouldn't pedantically
appertain to unnamed bit-fields
3) the standard says that it can appertain to identifier labels, but
we handled it silently also on case and default labels
4) I've run into a weird spurious error on
int f [[maybe_unused]];
int & [[maybe_unused]] i = f;
int && [[maybe_unused]] j = 0;
The problem was that we create an attribute variant for the int &
type, then create an attribute variant for the int && type, and
the type_canon_hash hashing just thought those 2 are the same,
so used int & [[maybe_unused]] type for j rather than
int && [[maybe_unused]]. As TYPE_REF_IS_RVALUE is a flag in the
generic code, it was easily possible to hash that flag and compare
it
2024-12-19 Jakub Jelinek <jakub@redhat.com>
PR c++/110345
gcc/
* tree.cc (type_hash_canon_hash): Hash TYPE_REF_IS_RVALUE for
REFERENCE_TYPE.
(type_cache_hasher::equal): Compare TYPE_REF_IS_RVALUE for
REFERENCE_TYPE.
gcc/cp/
* tree.cc (handle_maybe_unused_attribute): New function.
(std_attributes): Use handle_maybe_unused_attribute instead
of handle_unused_attribute for maybe_unused attribute.
gcc/testsuite/
* g++.dg/cpp0x/attr-maybe_unused1.C: New test.
* g++.dg/cpp0x/alignas21.C: Add test for int && alignas (int).
Jakub Jelinek [Fri, 20 Dec 2024 09:12:08 +0000 (10:12 +0100)]
c++: Disallow [[deprecated]] on types other than class/enum definitions [PR110345]
For C++ 26 P2552R3 I went through all the spots (except modules) where
attribute-specifier-seq appears in the grammar and tried to construct
a testcase in all those spots, for now for [[deprecated]] attribute.
The patch below contains that testcase. One needed change for this
particular attribute was that currently we handle [[deprecated]]
exactly the same as [[gnu::deprecated]], but for the latter unlike C++14
or later we allow it also on almost all types, while the standard
is strict and allows it only on
https://eel.is/c++draft/dcl.attr#deprecated-2
The attribute may be applied to the declaration of a class, a typedef-name,
a variable, a non-static data member, a function, a namespace,
an enumeration, an enumerator, a concept, or a template specialization.
The following patch just adds a pedwarn for the cases that gnu::deprecated
allows but C++14 disallows, so integral/floating/boolean types,
pointers/references, array types, function types etc.
Basically, for TYPE_P, if the attribute is applied in place (which means
the struct/union/class/enum definition), it is allowed, otherwise pedwarned.
I've tried to compile it also with latest clang and there is agreement in
most of the diagnostics, just at block scope (inside of foo) it doesn't
diagnose
auto e = new int [n] [[deprecated]];
auto e2 = new int [n] [[deprecated]] [42];
[[deprecated]] lab:;
and at namespace scope
[[deprecated]];
I think that all feels like clang++ bug.
Also this pedwarns on
[[deprecated]] int : 0;
at class scope, that isn't a non-static data member...
I guess to mark the paper as implemented (or what has been already voted
into C++23 earlier) we'll need to add similar testcase for all the other
standard attributes and make sure we check what the attributes can appertain
to and what they can't.
2024-12-19 Jakub Jelinek <jakub@redhat.com>
PR c++/110345
* parser.cc (cp_parser_std_attribute): Don't transform
[[deprecated]] into [[gnu::deprecated]].
* tree.cc (handle_std_deprecated_attribute): New function.
(std_attributes): Add deprecated entry.
Fortran: Fix caf_stop_numeric and reporting exceptions from caf [PR57598]
Caf_stop_numeric always exited with code 0, which is wrong. It shall
behave like regular stop. Add reporting exceptions to caf's stop
handlers. For this the existing library routine had to be exported.
libgfortran/ChangeLog:
PR fortran/57598
* caf/single.c (_gfortran_caf_stop_numeric): Report exceptions
on stop. And fix send_by_ref.
(_gfortran_caf_stop_str): Same.
(_gfortran_caf_error_stop_str): Same.
(_gfortran_caf_error_stop): Same.
* gfortran.map: Add report_exception for export.
* libgfortran.h (report_exception): Add to internal export.
* runtime/stop.c (report_exception): Same.
c++/modules: Validate external linkage definitions in header units [PR116401]
[module.import] p6 says "A header unit shall not contain a definition of
a non-inline function or variable whose name has external linkage."
This patch implements this requirement, and cleans up some issues in the
testsuite where this was already violated. To handle deduction guides
we mark them as inline, since although we give them a definition for
implementation reasons, by the standard they have no definition, and so
we should not error in this case.
PR c++/116401
gcc/cp/ChangeLog:
* decl.cc (grokfndecl): Mark deduction guides as 'inline'.
* module.cc (check_module_decl_linkage): Implement checks for
non-inline external linkage definitions in headers.
c++/modules: Check linkage for exported declarations
By [module.interface] p3, if an exported declaration is not within a
header unit, it shall not declare a name with internal linkage.
Unfortunately we cannot just do this within set_originating_module,
since at the locations its called the linkage for declarations are not
always fully determined yet. We could move the calls but this causes
the checking assertion to fail as the originating module declaration may
have moved, and in general for some kinds of declarations it's not
always obvious where it should be moved to.
This patch instead introduces a new function to check that the linkage
of a declaration within a module is correct, to be called for all
declarations once their linkage is fully determined.
As a drive-by fix this patch also improves the source location of
namespace aliases to point at the identifier rather than the terminating
semicolon.
c++/modules: Support unnamed namespaces in header units
A header unit may contain unnamed namespaces, and those declarations
are exported (as with any declaration in a header unit). This patch
ensures that such declarations are correctly handled.
The change to 'make_namespace_finish' is required so that if an unnamed
namespace is first seen by an import it is correctly handled within
'add_imported_namespace'. I don't see any particular reason why
handling of unnamed namespaces here had to be handled separately outside
that function since these are the only two callers.
gcc/cp/ChangeLog:
* module.cc (depset::hash::add_binding_entity): Also walk
unnamed namespaces.
(module_state::write_namespaces): Adjust assertion.
* name-lookup.cc (push_namespace): Move anon using-directive
handling to...
(make_namespace_finish): ...here.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-9_a.H: New test.
* g++.dg/modules/internal-9_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Fri, 11 Oct 2024 11:16:02 +0000 (22:16 +1100)]
c++/modules: Ignore TU-local entities where necessary
[basic.link] p14 lists a number of circumstances where a declaration
naming a TU-local entity is not an exposure, notably the bodies of
non-inline templates and friend declarations in classes. This patch
ensures that these references do not error when exporting the module.
We do need to still error on instantiation from a different module,
however, in case this refers to a TU-local entity. As such this patch
adds a new tree TU_LOCAL_ENTITY which is used purely as a placeholder to
poison any attempted template instantiations that refer to it.
This is also streamed for friend decls so that merging (based on the
index of an entity into the friend decl list) doesn't break and to
prevent complicating the logic; I imagine this shouldn't ever come up
though.
We also add a new warning, '-Wtemplate-names-tu-local', to handle the
case where someone accidentally refers to a TU-local value from within a
non-inline function template. This will compile without errors as-is,
but any attempt to instantiate the decl will fail; this warning can be
used to ensure that this doesn't happen. The warning is silenced for
any declarations with explicit instantiations, since uses of those
instantiations would not be exposures.
The main piece that this patch doesn't yet attempt to solve is ADL: as
specified, if ADL adds an overload set that includes a translation-unit
local entity when instantiating a template, that overload set is now
poisoned and counts as an exposure. Unfortunately, we don't currently
differentiate between decls that are hidden due to not being exported,
or decls that are hidden due to being hidden friends, so this patch
instead just keeps the current (wrong) behaviour of non-exported
entities not being visible to ADL at all.
Additionally, this patch doesn't attempt to ignore non-ODR uses of
constants in constexpr functions or templates. The obvious approach of
folding them early in 'mark_use' doesn't seem to work (for a variety of
reasons), so this leaves this to a later patch to implement, as it's at
least no worse than the current behaviour and easy enough to workaround.
For completeness this patch adds a new xtreme-header testcase to ensure
that we have no regressions with regards to exposures of TU-local
declarations in the standard library header files. A more restrictive
test would be to do 'export extern "C++"' here, but unfortunately the
system headers on some targets declare TU-local entities, so we'll make
do with checking that at least the C++ standard library headers don't
refer to such entities.
gcc/c-family/ChangeLog:
* c.opt: New warning '-Wtemplate-names-tu-local'.
gcc/cp/ChangeLog:
* cp-objcp-common.cc (cp_tree_size): Add TU_LOCAL_ENTITY.
* cp-tree.def (TU_LOCAL_ENTITY): New tree code.
* cp-tree.h (DECL_TEMPLATE_INSTANTIATIONS): Update comment.
(struct tree_tu_local_entity): New type.
(TU_LOCAL_ENTITY_NAME): New accessor.
(TU_LOCAL_ENTITY_LOCATION): New accessor.
(enum cp_tree_node_structure_enum): Add TS_CP_TU_LOCAL_ENTITY.
(union GTY): Add tu_local_entity field.
* module.cc (enum tree_tag): New flag DB_REFS_TU_LOCAL_BIT.
(depset::has_defn): Override for TU-local entities.
(depset::refs_tu_local): New accessor.
(depset::hash::ignore_tu_local): New field.
(depset::hash::hash): Initialize it.
(trees_out::tree_tag::tt_tu_local): New flag.
(trees_out::writing_local_entities): New field.
(trees_out::is_initial_scan): New function.
(trees_out::tu_local_count): New counter.
(trees_out::trees_out): Initialize writing_local_entities.
(dumper::impl::nested_name): Handle TU_LOCAL_ENTITY.
(trees_out::instrument): Report TU-local entity counts.
(trees_out::decl_value): Early exit for TU-local entities.
(trees_in::decl_value): Handle typedefs of TU-local entities.
(trees_out::decl_node): Adjust assertion to cope with early exit
of TU-local deps. Always write TU-local entities by value.
(trees_out::type_node): Handle TU-local types.
(trees_out::has_tu_local_dep): New function.
(trees_out::find_tu_local_decl): New function.
(trees_out::tree_node): Intercept TU-local entities and write
placeholder values for them instead of normal streaming.
(trees_in::tree_node): Handle TU-local entities and TU-local
template results.
(trees_out::write_function_def): Ignore exposures in non-inline
function bodies.
(trees_out::write_var_def): Ignore exposures in initializers.
(trees_out::write_class_def): Ignore exposures in friend decls.
(trees_in::read_class_def): Skip TU-local friends.
(trees_out::write_definition): Record whether we're writing a
decl which refers to TU-local entities.
(depset::hash::add_dependency): Only mark as exposure if we're not
ignoring TU-local entities.
(depset::hash::find_dependencies): Use depset's own is_key_order
function rather than delegating via walker. Pass whether the
decl has ignored TU-local entities in its definition.
(template_has_explicit_inst): New function.
(depset::hash::finalize_dependencies): Implement new warning
Wtemplate-names-tu-local.
(module_state::intercluster_seed): Don't seed TU-local deps.
(module_state::write_cluster): Pass whether the decl has ignored
TU-local entities in its definition.
* pt.cc (register_specialization): Always register in a module.
(complain_about_tu_local_entity): New function.
(expr_contains_tu_local_entity): New function.
(function_contains_tu_local_entity): New function.
(instantiate_class_template): Skip TU-local friends.
(tsubst_decl): Handle typedefs of TU-local entities.
(tsubst): Complain about TU-local entities.
(dependent_operand_p): Early exit for TU-local entities so we
don't attempt to constant-evaluate them.
(tsubst_expr): Detect and complain about TU-local entities.
* g++.dg/modules/internal-5_a.C: New test.
* g++.dg/modules/internal-5_b.C: New test.
* g++.dg/modules/internal-6.C: New test.
* g++.dg/modules/internal-7_a.C: New test.
* g++.dg/modules/internal-7_b.C: New test.
* g++.dg/modules/internal-8_a.C: New test.
* g++.dg/modules/xtreme-header-8.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Tue, 8 Oct 2024 09:50:38 +0000 (20:50 +1100)]
c++/modules: Detect exposures of TU-local entities
Currently, the modules streaming code implements some checks for
declarations in the CMI that reference (some kinds of) internal-linkage
entities, and errors if so. This patch expands on that support to
implement the logic for exposures of TU-local entities as defined in
[basic.link] since P1815.
This will cause some code that previously errored in modules to start
compiling; for instance, template specialisations of internal linkage
functions.
However, some code that previously appeared valid will with this patch
no longer compile, notably some kinds of usages of internal linkage
functions included from the GMF. This appears to be related to P2808
and FR-025, however as yet there doesn't appear to be consensus for
changing these rules so I've implemented them as-is.
This patch leaves a couple of things out. In particular, a couple of
the rules for what is a TU-local entity currently seem to me to be
redundant; I've left them as FIXMEs to be handled once I can find
testcases that aren't adequately supported by the other logic here.
Additionally, there are some exceptions for when naming a TU-local
entity is not always an exposure; I've left support for this to a
follow-up patch for easier review, as it has broader implications for
streaming.
TU-local lambdas are also not yet properly implemented, due to other
bugs with regards to LAMBDA_TYPE_EXTRA_SCOPE not being set in all cases
that it probably should be (see also PR c++/116568). We can revisit
this once that issue has been fixed.
Finally, this patch makes a couple of small adjustments to the modules
streaming logic to prune any leftover TU-local deps (that aren't
erroneous exposures). This is required for this patch to ensure that
later stages don't get confused by any leftover TU-local entities
floating around.
gcc/cp/ChangeLog:
* tree.cc (decl_linkage): Treat DECL_SELF_REFERENCE_P like
DECL_IMPLICIT_TYPEDEF_P.
* name-lookup.cc (do_namespace_alias): Fix linkage.
* module.cc (DB_IS_INTERNAL_BIT): Rename to...
(DB_TU_LOCAL_BIT): ...this.
(DB_REFS_INTERNAL_BIT): Rename to...
(DB_EXPOSURE_BIT): ...this.
(depset::hash::is_internal): Rename to...
(depset::hash::is_tu_local): ...this.
(depset::hash::refs_internal): Rename to...
(depset::hash::is_exposure): ...this.
(depset::hash::is_tu_local_entity): New function.
(depset::hash::has_tu_local_tmpl_arg): New function.
(depset::hash::is_tu_local_value): New function.
(depset::hash::make_dependency): Check for TU-local entities.
(depset::hash::add_dependency): Make current an exposure
whenever it references a TU-local entity.
(depset::hash::add_binding_entity): Don't create bindings for
any TU-local entity.
(depset::hash::finalize_dependencies): Rename flags and adjust
diagnostic messages to report exposures of TU-local entities.
(depset::tarjan::connect): Don't include any TU-local depsets.
(depset::hash::connect): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/block-decl-2.C: Adjust messages.
* g++.dg/modules/internal-1.C: Adjust messages, remove XFAILs.
* g++.dg/modules/linkage-2.C: Adjust messages, remove XFAILS.
* g++.dg/modules/internal-3.C: New test.
* g++.dg/modules/internal-4_a.H: New test.
* g++.dg/modules/internal-4_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
François Dumont [Mon, 22 Jul 2024 19:54:36 +0000 (21:54 +0200)]
libstdc++: Add fancy pointer support to std::map and std::set [PR57272]
The fancy allocator pointer type support is added to std::map,
std::multimap, std::multiset and std::set through the underlying
std::_Rb_tree class.
To respect ABI a new parralel hierarchy of node types has been added.
This change introduces new class template parameterized on the
allocator's void_pointer type, __rb_tree::_Node_base, and new class
templates parameterized on the allocator's pointer type, __rb_tree::_Node,
__rb_tree::_Iterator. The iterator class template is used for both
iterator and const_iterator. Whether std::_Rb_tree<K, V, KoV, C, A>
should use the old _Rb_tree_node<V> or new __rb_tree::_Node<A::pointer>
type family internally is controlled by a new __rb_tree::_Node_traits
traits template.
Because std::pointer_traits and std::__to_address are not defined for
C++98, there is no way to support fancy pointers in C++98. For C++98 the
_Node_traits traits always choose the old _Rb_tree_node family.
In case anybody is currently using std::_Rb_tree with an allocator that
has a fancy pointer, this change would be an ABI break, because their
std::_Rb_tree instantiations would start to (correctly) use the fancy
pointer type. If the fancy pointer just contains a single pointer and so
has the same size, layout, and object representation as a raw pointer,
the code might still work (despite being an ODR violation). But if their
fancy pointer has a different representation, they would need to
recompile all their code using that allocator with std::_Rb_tree. Because
std::_Rb_tree will never use fancy pointers in C++98 mode, recompiling
everything to use fancy pointers isn't even possible if mixing C++98 and
C++11 code that uses std::_Rb_tree. To alleviate this problem, compiling
with -D_GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE=0 will force std::_Rb_tree to
have the old, non-conforming behaviour and use raw pointers internally.
For testing purposes, compiling with -D_GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE=9001
will force std::_Rb_tree to always use the new node types. This macro is
currently undocumented, which needs to be fixed.
As _Rb_tree is using _Base_ptr to represent the tree this change also
simplifies the implementation by removing all the const pointer types
and associated methods.
libstdc++-v3/ChangeLog:
PR libstdc++/57272
* include/bits/stl_tree.h
[_GLIBCXX_USE_ALLOC_PTR_FOR_RB_TREE]: New macro to control usage of the
code required to support fancy allocator pointer type.
(_Rb_tree_node_base::_Const_Base_ptr): Remove.
(_Rb_tree_node_base::_S_minimum, _Rb_tree_node_base::_S_maximum): Remove
overloads for _Const_Base_ptr.
(_Rb_tree_node_base::_M_base_ptr()): New.
(_Rb_tree_node::_Link_type): Remove.
(_Rb_tree_node::_M_node_ptr()): New.
(__rb_tree::_Node_base<>): New.
(__rb_tree::_Header<>): New.
(__rb_tree::_Node<>): New.
(_Rb_tree_increment(const _Rb_tree_node_base*)): Remove declaration.
(_Rb_tree_decrement(const _Rb_tree_node_base*)): Remove declaration.
(_Rb_tree_iterator<>::_Self): Remove.
(_Rb_tree_iterator<>::_Link_type): Rename into...
(_Rb_tree_iterator<>::_Node_ptr): ...this.
(_Rb_tree_const_iterator<>::_Link_type): Rename into...
(_Rb_tree_const_iterator<>::_Node_ptr): ...this.
(_Rb_tree_const_iterator<>::_M_const_cast): Remove.
(_Rb_tree_const_iterator<>::_M_node): Change type into _Base_ptr.
(__rb_tree::_Iterator<>): New.
(__rb_tree::_Node_traits<>): New.
(_Rb_tree<>::_Node_base, _Rb_tree::_Node): New.
(_Rb_tree<>::_Link_type): Rename into...
(_Rb_tree<>::_Node_ptr): ...this.
(_Rb_tree<>::_Const_Base_ptr, _Rb_tree<>::_Const_Node_ptr): Remove.
(_Rb_tree<>::_M_mbegin): Remove.
(_Rb_tree<>::_M_begin_node()): New.
(_S_key(const _Node&)): New.
(_S_key(_Base_ptr)): New, call latter.
(_S_key(_Node_ptr)): Likewise.
(_Rb_tree<>::_S_left(_Const_Base_ptr)): Remove.
(_Rb_tree<>::_S_right(_Const_Base_ptr)): Remove.
(_Rb_tree<>::_S_maximum(_Const_Base_ptr)): Remove.
(_Rb_tree<>::_S_minimum(_Const_Base_ptr)): Remove.
* testsuite/23_containers/map/allocator/ext_ptr.cc: New test case.
* testsuite/23_containers/multimap/allocator/ext_ptr.cc: New test case.
* testsuite/23_containers/multiset/allocator/ext_ptr.cc: New test case.
* testsuite/23_containers/set/allocator/ext_ptr.cc: New test case.
* testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr.cc:
New test case.
* testsuite/23_containers/set/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
New test case.
Patrick Palka [Thu, 19 Dec 2024 17:00:31 +0000 (12:00 -0500)]
c++: optimize constraint subsumption [PR118069]
Since atomic constraints are interned the subsumption machinery can
safely use pointer instead of structural hashing for them. This speeds
up compilation of the testcase in the PR from ~3s to ~2s.
PR c++/118069
gcc/cp/ChangeLog:
* constraint.cc (atom_hasher): Define here, instead of ...
* cp-tree.h (atom_hasher): ... here.
* logic.cc (clause::m_set): Use pointer instead of structural
hashing.
Patrick Palka [Thu, 19 Dec 2024 17:00:29 +0000 (12:00 -0500)]
c++: integer overflow during constraint subsumption [PR118069]
For the testcase in the PR we hang during constraint subsumption
ultimately because one of the constraints is complex enough that its
conjunctive normal form is calculated to have more than 2^31 clauses,
which causes the size calculation (through an int) to overflow and so
the optimization in subsumes_constraints_nonnull
if (dnf_size (lhs) <= cnf_size (rhs))
// iterate over DNF of LHS
else
// iterate over CNF of RHS
incorrectly decides to loop over the CNF (>> billions of clauses)
instead of the DNF (thousands of clauses).
I haven't verified that the result of cnf_size is correct for the
problematic constraint but integer overflow is definitely plausible
given that CNF/DNF can be exponentially larger than the original
constraint in the worst case.
This patch fixes this by using 64-bit saturating arithmetic during
these size calculations (via new add/mul_sat_hwi functions) so that
overflow is less likely and if it does occur we handle it gracefully.
It should be highly unlikely that both the DNF and CNF sizes overflow,
and if they do then it doesn't matter which form we select, subsumption
will take forever either way. The testcase now compiles in ~3 seconds
on my machine after this change.
PR c++/118069
gcc/ChangeLog:
* hwint.h (add_sat_hwi): New function.
(mul_sat_hwi): Likewise.
gcc/cp/ChangeLog:
* logic.cc (dnf_size_r): Use HOST_WIDE_INT instead of int, and
handle overflow gracefully via add_sat_hwi and mul_sat_hwi.
(cnf_size_r): Likewise.
(dnf_size): Use HOST_WIDE_INT instead of int.
(cnf_size): Likewise.
Tobias Burnus [Thu, 19 Dec 2024 16:27:41 +0000 (17:27 +0100)]
OpenMP: Add 'nec' as to the 'vendor' context-selector list
For unknown vendors using in a context selector such as
match(implementation={vendor(...)})
GCC prints a warning like:
warning: unknown property 'nec' of 'vendor' selector
While all known vendors (including the vendor 'unknown') are silently
accepted, only "gnu" counts as matched by GCC.
The list of known vendors is published in OpenMP's additional
definition document (or, previously, the context definitions document).
While the initial list did not contain 'nec', it was added quite early
but GCC missed this addition, which this commit rectifies.
Some history:
* GCC added the list in r10-3744-g94e7f906ca5c73 (Oct 2019)
* At spec level, 'pgi' was replaced by 'nvidia' in Nov 2019, but
GCC (since r10-4639-gd0ec7c935f0c96, Nov 2019) and LLVM recognize
both vendor names.
* 'nec' was then added in Dec 2019 and is present in
"Context Definitions for the OpenMP API Specification Version 5.0
– Version 1.0", but only this commit adds it.
* 'hpe' (as alias for 'cray') was added to the spec in Nov 2020 but
to GCC only in r14-6720-gd0603dfe9d3bc7 (Dec 2023).
Patrick Palka [Thu, 19 Dec 2024 16:31:19 +0000 (11:31 -0500)]
libstdc++: Implement C++23 <flat_set> (P1222R4)
This implements the C++23 container adaptors std::flat_set and
std::flat_multiset from P1222R4. The implementation is essentially
an simpler and pared down version of std::flat_map.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header <flat_set>.
* include/Makefile.in: Regenerate.
* include/bits/version.def (__cpp_flat_set): Define.
* include/bits/version.h: Regenerate
* include/precompiled/stdc++.h: Include <flat_set>.
* include/std/flat_set: New file.
* src/c++23/std.cc.in: Export <flat_set>.
* testsuite/23_containers/flat_multiset/1.cc: New test.
* testsuite/23_containers/flat_set/1.cc: New test.
Co-authored-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Patrick Palka [Thu, 19 Dec 2024 16:31:09 +0000 (11:31 -0500)]
libstdc++: Implement C++23 <flat_map> (P0429R9)
This implements the C++23 container adaptors std::flat_map and
std::flat_multimap from P0429R9. The implementation is shared
as much as possible between the two adaptors via a common base
class that's parameterized according to key uniqueness.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add new header <flat_map>.
* include/Makefile.in: Regenerate.
* include/bits/alloc_traits.h (__not_allocator_like): New concept.
* include/bits/stl_function.h (__transparent_comparator): Likewise.
* include/bits/stl_iterator_base_types.h (__has_input_iter_cat):
Likewise.
* include/bits/uses_allocator.h (__allocator_for): Likewise.
* include/bits/utility.h (sorted_unique_t): Define for C++23.
(sorted_unique): Likewise.
(sorted_equivalent_t): Likewise.
(sorted_equivalent): Likewise.
* include/bits/version.def (flat_map): Define.
* include/bits/version.h: Regenerate.
* include/precompiled/stdc++.h: Include <flat_map>.
* include/std/flat_map: New file.
* src/c++23/std.cc.in: Export <flat_map>.
* testsuite/23_containers/flat_map/1.cc: New test.
* testsuite/23_containers/flat_multimap/1.cc: New test.
Co-authored-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
and then we pass the EXPR_STMT to maybe_constant_init, with D.2707 as
the object. But their types don't match anymore, so we crash. We'd
have to pass D.2707.it as the object for it to work.
This patch adjusts cxx_eval_outermost_constant_expr to take the object's
type if available.
constexpr-prvalue3.C is reduced from a large std::ranges libstdc++ test.
PR c++/117980
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_outermost_constant_expr): If there's
an object to initialize, take its type. Don't set the type
in the constexpr dtor case.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-prvalue2.C: New test.
* g++.dg/cpp0x/constexpr-prvalue3.C: New test.
testsuite: arm: Use effective-target for memset-inline* tests
Split tests into 2 parts:
- The first part checkes the assmbler generated.
- The second part does the run test and this part now requires
effective-target arm_neon_hw.
Jakub Jelinek [Thu, 19 Dec 2024 10:36:29 +0000 (11:36 +0100)]
testsuite: Fix toplevel-asm-1.c failure for riscv
On Wed, Dec 18, 2024 at 01:19:43PM +0100, Andreas Schwab wrote:
> On Dez 12 2024, Jakub Jelinek wrote:
>
> > The intent was to test %cN because %N doesn't DTRT on various targets.
> > I have a patch to add %ccN support which should then work even on riscv
> > hopefully, but unfortunately it hasn't been fully reviewed yet.
>
> That didn't change toplevel-asm-1, so the failure remains.
Yes, I've only committed what was approved.
The following patch ought to fix this (and if there are other targets which
don't really support %cN for SYMBOL_REFs even with -fno-pic, they can be
added there too; I think it is useful to test %cN on the targets where it
works though).
2024-12-19 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/toplevel-asm-1.c: Use %cc3 %cc4 instead of %c3 %c4
on riscv.
Pan Li [Thu, 19 Dec 2024 01:03:59 +0000 (09:03 +0800)]
RISC-V: Adjust the strided store testcases check times on options
The vsse* dump check times changes on options (O2, O3) after we add
(mem:BLK (scratch)) to the define_insn of strided load.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-f64.c: Adjust
the vsse check times based on optimization option.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/strided/strided_ld_st-1-u64.c: Ditto.
Pan Li [Thu, 19 Dec 2024 00:58:20 +0000 (08:58 +0800)]
RISC-V: Make vector strided store alias all other memories
Almost the same as the RVV strided load, the vector strided store
doesn't involve the (mem:BLK (scratch)) to alias all other memories.
It will make the alias analysis only consider the base address of
strided store.
PR target/118075
gcc/ChangeLog:
* config/riscv/vector.md: Add the (mem:BLK (scratch)) as the
lhs of strided store define insn.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118075-run-1.c: New test.
Alexandre Oliva [Thu, 19 Dec 2024 01:17:31 +0000 (22:17 -0300)]
ifcombine field merge: handle masks with sign extensions
When a loaded field is sign extended, masked and compared, we used to
drop from the mask the bits past the original field width, which is
not correct.
Take note of the fact that the mask covered copies of the sign bit,
before clipping it, and arrange to test the sign bit if we're
comparing with zero. Punt in other cases.
If bits_test fail recoverably, try other ifcombine strategies.
for gcc/ChangeLog
* gimple-fold.cc (decode_field_reference): Add psignbit
parameter. Set it if the mask references sign-extending
bits.
(fold_truth_andor_for_ifcombine): Adjust calls with new
variables. Swap them along with other r?_* variables. Handle
extended sign bit compares with zero.
* tree-ssa-ifcombine.cc (ifcombine_ifandif): If bits_test
fails in a way that doesn't prevent other ifcombine strategies
from passing, give them a try.
Alexandre Oliva [Thu, 19 Dec 2024 01:17:18 +0000 (22:17 -0300)]
ifcombine field merge: handle bitfield zero tests in range tests
Some bitfield compares with zero are optimized to range tests, so
instead of X & ~(Bit - 1) != 0 what reaches ifcombine is X > (Bit - 1),
where Bit is a power of two and X is unsigned.
This patch recognizes this optimized form of masked compares, and
attempts to merge them like masked compares, which enables some more
field merging that a folder version of fold_truth_andor used to handle
without additional effort.
I haven't seen X & ~(Bit - 1) == 0 become X <= (Bit - 1), or X < Bit
for that matter, but it was easy enough to handle the former
symmetrically to the above.
The latter was also easy enough, and so was its symmetric, X >= Bit,
that is handled like X & ~(Bit - 1) != 0.
for gcc/ChangeLog
* gimple-fold.cc (decode_field_reference): Accept incoming
mask.
(fold_truth_andor_for_ifcombine): Handle some compares with
powers of two, minus 1 or 0, like masked compares with zero.
Alexandre Oliva [Thu, 19 Dec 2024 01:17:13 +0000 (22:17 -0300)]
noncontiguous ifcombine: skip marking of non-SSA_NAMEs [PR117915]
When ifcombine_mark_ssa_name is called directly, rather than by
ifcombine_mark_ssa_name_walk, we need to check that name is an
SSA_NAME at the caller or in the function itself. For convenience and
safety, I'm moving the checks from _walk to the implementation proper.
Alexandre Oliva [Thu, 19 Dec 2024 01:17:02 +0000 (22:17 -0300)]
ifcombine field merge: do not follow a second conversion [PR118046]
The testcase shows that conversions that would impact negatively the
ifcombine field merging implementation won't always have been
optimized out by the time we reach ifcombine.
There's probably room to support multiple conversions with extra
logic, but this workaround should avoid codegen errors until that
logic is figured out.
for gcc/ChangeLog
PR tree-optimization/118046
* gimple-fold.cc (decode_field_reference): Don't follow more
than one conversion.
Alexandre Oliva [Thu, 19 Dec 2024 01:16:58 +0000 (22:16 -0300)]
ifcombine field merge: stricten loads tests, swap compare to match
ACATS-4 ca11d02 exposed an error in the logic for recognizing and
identifying the inner object in decode_field_ref: a view-converting
load, inserted in a previous successful field merging operation, was
recognized by gimple_convert_def_p within decode_field_reference, and
as a result we took its operand as the expression, and failed to take
note of the load location.
Without that load, we couldn't compare vuses, and then we ended up
inserting a wider load before relevant parts of the object were
initialized.
This patch makes gimple_convert_def_p recognize loads only when
requested, and requires that either both or neither parts of a
potentially merged operand have associated loads.
As a bonus, it enables additional optimizations by swapping the
operands of the second compare when that makes left-hand operands
of both compares match.
for gcc/ChangeLog
* gimple-fold.cc (gimple_convert_def_p): Reject load stmts
unless requested.
(decode_field_reference): Accept a converting load at the last
conversion matcher, subsuming the load identification.
(fold_truth_andor_for_ifcombine): Refuse to merge operands
when only one of them has an associated load stmt. Swap
operands of one of the compares if that helps them match.
Eric Botcazou [Wed, 18 Dec 2024 20:48:36 +0000 (21:48 +0100)]
Fix bootstrap failure on SPARC with -O3 -mvis3
This replaces the use of FAIL in the new vec_cmp[u] expanders by that of a
predicate for the operator, which is (apparently) required for the optabs
machinery to properly compute the set of supported vector comparisons.
gcc/
PR target/118096
* config/sparc/predicates.md (vec_cmp_operator): New predicate.
(vec_cmpu_operator): Likewise.
* config/sparc/sparc.md (vec_cmp<FPCMP:mode><P:mode>): Use the
vec_cmp_operator predicate instead of FAILing the expansion.
(vec_cmpu<FPCMP:mode><P:mode>): Likewise for vec_cmpu_operator.
Michal Jires [Wed, 18 Dec 2024 17:28:46 +0000 (18:28 +0100)]
ipcp don't propagate where not needed - fix uninit constructor
Removed unitialized empty constructor as was objected.
gcc/ChangeLog:
* lto-cgraph.cc (lto_symtab_encoder_delete_node):
Declare var later when initialized.
* lto-streamer.h (struct lto_encoder_entry):
Remove empty constructor.
Tamar Christina [Wed, 18 Dec 2024 16:39:25 +0000 (16:39 +0000)]
libstdc++: Adjust probabilities of hashmap loop conditions
We are currently generating a loop which has more comparisons than you'd
typically need as the probablities on the small size loop are such that it
assumes the likely case is that an element is not found.
This again generates a pattern that's harder for branch predictors to follow,
but also just generates more instructions for the what one could say is the
typical case: That your hashtable contains the entry you are looking for.
This patch adds a __builtin_expect in _M_find_before_node where at the moment
the loop is optimized for the case where we don't do any iterations.
A simple testcase is (compiled with -fno-split-path to simulate the loop
in libstdc++):
#include <stdbool.h>
bool foo (int **a, int n, int val, int *tkn)
{
for (int i = 0; i < n; i++)
{
if (!a[i] || a[i]==tkn)
return false;
i.e. BB rotation makes is generate an unconditional branch to a conditional
branch. However this method is only called when the size is above a certain
threshold, and so it's likely that we have to do that first iteration.
Adding:
#include <stdbool.h>
bool foo (int **a, int n, int val, int *tkn)
{
for (int i = 0; i < n; i++)
{
if (__builtin_expect(!a[i] || a[i]==tkn, 0))
return false;
if (*a[i] == val)
return true;
}
}
to indicate that we will likely do an iteration more generates:
Jonathan Wakely [Sat, 14 Dec 2024 01:17:27 +0000 (01:17 +0000)]
libstdc++: Clear std::priority_queue after moving from it [PR118088]
We don't know what state an arbitrary sequence container will be in
after moving from it, so a moved-from std::priority_queue needs to clear
the moved-from container to ensure it doesn't contain elements that are
in an invalid order for the queue. An alternative would be to call
std::make_heap again to re-establish the rvalue queue's invariant, but
that could potentially cause an exception to be thrown. Just clearing it
so the sequence is empty seems safer and more likely to match user
expectations.
libstdc++-v3/ChangeLog:
PR libstdc++/118088
* include/bits/stl_queue.h (priority_queue(priority_queue&&)):
Clear the source object after moving from it.
(priority_queue(priority_queue&&, const Alloc&)): Likewise.
(operator=(priority_queue&&)): Likewise.
* testsuite/23_containers/priority_queue/118088.cc: New test.
Michal Jires [Thu, 24 Oct 2024 01:02:55 +0000 (03:02 +0200)]
lto: Remap node order for stability.
This patch adds remapping of node order for each lto partition.
Resulting order conserves relative order inside partition, but
is independent of outside symbols. So if lto partition contains
identical set of symbols, their remapped order will be stable
between compilations.
Michal Jires [Thu, 24 Oct 2024 00:21:00 +0000 (02:21 +0200)]
Node clones share order.
Symbol order corresponds to the order in source code.
For clones their order is currently arbitrarily chosen as max order++
But it would be more consistent with original purpose to choose clones
order to be shared with the original node order.
This stabilizes clone order for Incremental LTO.
Order is thus no longer unique, but this property is not used outside
of previous patch, where we can use uid.
If total order would be needed, sorting by order and then uid suffices.
gcc/ChangeLog:
* cgraph.h (symbol_table::register_symbol):
Order can be already set.
* cgraphclones.cc (cgraph_node::create_clone):
Reuse order for clones.
Michal Jires [Thu, 24 Oct 2024 00:04:12 +0000 (02:04 +0200)]
ipa-strub: Replace cgraph_node order with uid.
ipa_strub_set_mode_for_new_functions uses node order as unique ever
increasing identifier. This is better satisfied with uid.
Order loses uniqueness with following patches.
gcc/ChangeLog:
* ipa-strub.cc (ipa_strub_set_mode_for_new_functions): Replace
order with uid.
(pass_ipa_strub_mode::execute): Likewise.