David Faust [Mon, 10 Jun 2024 17:54:53 +0000 (10:54 -0700)]
btf: add -gprune-btf option
This patch adds a new option, -gprune-btf, to control BTF debug info
generation.
As the name implies, this option enables a kind of "pruning" of the BTF
information before it is emitted. When enabled, rather than emitting
all type information translated from DWARF, only information for types
directly used in the source program is emitted.
The primary purpose of this pruning is to reduce the amount of
unnecessary BTF information emitted, especially for BPF programs. It is
very common for BPF programs to include Linux kernel internal headers in
order to have access to kernel data structures. However, doing so often
has the side effect of also adding type definitions for a large number
of types which are not actually used by nor relevant to the program.
In these cases, -gprune-btf commonly reduces the size of the resulting
BTF information by 10x or more, as seen on average when compiling Linux
kernel BPF selftests. This both slims down the size of the resulting
object and reduces the time required by the BPF loader to verify the
program and its BTF information.
Note that the pruning implemented in this patch follows the same rules
as the BTF pruning performed unconditionally by LLVM's BPF backend when
generating BTF. In particular, the main sources of pruning are:
1) Only generate BTF for types used by variables and functions at the
file scope.
Note that which variables are known to be "used" may differ
slightly between LTO and non-LTO builds due to optimizations. For
non-LTO builds (and always for the BPF target), variables which are
optimized away during compilation are considered to be unused, and
they (along with their types) are pruned. For LTO builds, such
variables are not known to be optimized away by the time pruning
occurs, so VAR records for them and information for their types may
be present in the emitted BTF information. This is a missed
optimization that may be fixed in the future.
2) Avoid emitting full BTF for struct and union types which are only
pointed-to by members of other struct/union types. In these cases,
the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
be emitted is replaced with a BTF_KIND_FWD, as though the
underlying type was a forward-declared struct or union type.
gcc/
* btfout.cc (btf_used_types): New hash set.
(struct btf_fixup): New.
(fixups, forwards): New vecs.
(btf_output): Calculate num_types depending on debug_prune_btf.
(btf_early_finsih): New initialization for debug_prune_btf.
(btf_add_used_type): New function.
(btf_used_type_list_cb): Likewise.
(btf_collect_pruned_types): Likewise.
(btf_add_vars): Handle special case for variables in ".maps" section
when generating BTF for BPF CO-RE target.
(btf_late_finish): Use btf_collect_pruned_types when debug_prune_btf
is in effect. Move some initialization to btf_early_finish.
(btf_finalize): Additional deallocation for debug_prune_btf.
* common.opt (gprune-btf): New flag.
* ctfc.cc (init_ctf_strtable): Make non-static.
* ctfc.h (init_ctf_strtable, ctfc_delete_strtab): Make extern.
* doc/invoke.texi (Debugging Options): Document -gprune-btf.
David Faust [Thu, 30 May 2024 21:06:27 +0000 (14:06 -0700)]
btf: refactor and simplify implementation
This patch heavily refactors btfout.cc to take advantage of the
structural changes in the prior commits.
Now that inter-type references are internally stored as simply pointers,
all the painful, brittle, confusing infrastructure that was used in the
process of converting CTF type IDs to BTF type IDs can be thrown out.
This greatly simplifies the entire process of converting from CTF to
BTF, making the code cleaner, easier to read, and easier to maintain.
In addition, we no longer need to worry about destructive changes in
internal data structures used commonly by CTF and BTF, which allows
deleting several ancillary data structures previously used in btfout.cc.
This is nearly transparent, but a few improvements have also been made:
1) BTF_KIND_FUNC records are now _always_ constructed at early_finish,
allowing us to construct records even for functions which are later
inlined by optimizations. DATASEC entries for functions are only
constructed at late_finish, to avoid incorrectly generating entries
for functions which get inlined.
2) BTF_KIND_VAR records and DATASEC entries for them are now always
constructed at (late) finish, which avoids cases where we could
incorrectly create records for variables which were completely
optimized away. This fixes PR debug/113566 for non-LTO builds.
In LTO builds, BTF must be emitted at early_finish, so some VAR
records may be emitted for variables which are later optimized away.
3) Some additional assembler comments have been added with more
information for debugging.
gcc/
* btfout.cc (struct btf_datasec_entry): New.
(struct btf_datasec): Add `id' member. Change `entries' to use
new struct btf_datasec_entry.
(func_map): New hash_map.
(max_translated_id): New.
(btf_var_ids, btf_id_map, holes, voids, num_vars_added)
(num_types_added, num_types_created): Delete.
(btf_absolute_var_id, btf_relative_var_id, btf_absolute_func_id)
(btf_relative_func_id, btf_absolute_datasec_id, init_btf_id_map)
(get_btf_id, set_btf_id, btf_emit_id_p): Delete.
(btf_removed_type_p): Delete.
(btf_dtd_kind, btf_emit_type_p): New helpers.
(btf_fwd_to_enum_p, btf_calc_num_vbytes): Use them.
(btf_collect_datasec): Delete.
(btf_dtd_postprocess_cb, btf_dvd_emit_preprocess_cb)
(btf_dtd_emit_preprocess_cb, btf_emit_preprocess): Delete.
(btf_dmd_representable_bitfield_p): Adapt to type reference changes
and delete now-unused ctfc argument.
(btf_asm_datasec_type_ref): Delete.
(btf_asm_type_ref): Adapt to type reference changes, simplify.
(btf_asm_type): Likewise. Mark struct/union types with bitfield
members.
(btf_asm_array): Adapt to data structure changes.
(btf_asm_varent): Likewise.
(btf_asm_sou_member): Likewise. Ensure non-bitfield members are
correctly re-encoded if struct or union contains any bitfield.
(btf_asm_func_arg, btf_asm_func_type, btf_asm_datasec_entry)
(btf_asm_datasec_type): Adapt to data structure changes.
(output_btf_header): Adapt to other changes, simplify type
length calculation, add info to assembler comments.
(output_btf_vars): Adapt to other changes.
(output_btf_strs): Fix overlong lines.
(output_asm_btf_sou_fields, output_asm_btf_enum_list)
(output_asm_btf_func_args_list, output_asm_btf_vlen_bytes)
(output_asm_btf_type, output_btf_types, output_btf_func_types)
(output_btf_datasec_types): Adapt to other changes.
(btf_init_postprocess): Delete.
(btf_output): Change to only perform output.
(btf_add_const_void, btf_add_func_records): New.
(btf_early_finish): Use them here. New.
(btf_datasec_push_entry): Adapt to data structure changes.
(btf_datasec_add_func, btf_datasec_add_var): New.
(btf_add_func_datasec_entries): New.
(btf_emit_variable_p): New helper.
(btf_add_vars): Use it here. New.
(btf_type_list_cb, btf_collect_translated_types): New.
(btf_assign_func_ids, btf_late_assign_var_ids)
(btf_assign_datasec_ids): New.
(btf_finish): Remove unused argument. Call new btf_late*
functions and btf_output.
(btf_finalize): Adapt to data structure changes.
* ctfc.h (struct ctf_dtdef): Convert existing boolean flags to
BOOL_BITFIELD and reorder.
(struct ctf_dvdef): Add dvd_id member.
(btf_finish): Remove argument from prototype.
(get_btf_id): Delete prototype.
(funcs_traverse_callback, traverse_btf_func_types): Add an
explanatory comment.
* dwarf2ctf.cc (ctf_debug_finish): Remove unused argument.
* dwarf2ctf.h: Analogous change.
* dwarf2out.cc: Likewise.
David Faust [Thu, 30 May 2024 21:06:27 +0000 (14:06 -0700)]
ctf: use pointers instead of IDs internally
This patch replaces all inter-type references in the ctfc internal data
structures with pointers, rather than the references-by-ID which were
used previously.
A couple of small updates in the BPF backend are included to make it
compatible with the change.
This change is only to the in-memory representation of various CTF
structures to make them easier to work with in various cases. It is
outwardly transparent; there is no change in emitted CTF.
gcc/
* btfout.cc (BTF_VOID_TYPEID, BTF_INIT_TYPEID): Move defines to
include/btf.h.
(btf_dvd_emit_preprocess_cb, btf_emit_preprocess)
(btf_dmd_representable_bitfield_p, btf_asm_array, btf_asm_varent)
(btf_asm_sou_member, btf_asm_func_arg, btf_init_postprocess):
Adapt to structural changes in ctf_* structs.
* ctfc.h (struct ctf_dtdef): Add forward declaration.
(ctf_dtdef_t, ctf_dtdef_ref): Move typedefs earlier.
(struct ctf_arinfo, struct ctf_funcinfo, struct ctf_sliceinfo)
(struct ctf_itype, struct ctf_dmdef, struct ctf_func_arg)
(struct ctf_dvdef): Use pointers instead of type IDs for
references to other types and use typedefs where appropriate.
(struct ctf_dtdef): Add ref_type member.
(ctf_type_exists): Use pointer instead of type ID.
(ctf_add_reftype, ctf_add_enum, ctf_add_slice, ctf_add_float)
(ctf_add_integer, ctf_add_unknown, ctf_add_pointer)
(ctf_add_array, ctf_add_forward, ctf_add_typedef)
(ctf_add_function, ctf_add_sou, ctf_add_enumerator)
(ctf_add_variable): Likewise. Return pointer instead of ID.
(ctf_lookup_tree_type): Return pointer to type instead of ID.
* ctfc.cc: Analogous changes.
* ctfout.cc (ctf_asm_type, ctf_asm_slice, ctf_asm_varent)
(ctf_asm_sou_lmember, ctf_asm_sou_member, ctf_asm_func_arg)
(output_ctf_objt_info): Adapt to changes.
* dwarf2ctf.cc (gen_ctf_type, gen_ctf_void_type)
(gen_ctf_unknown_type, gen_ctf_base_type, gen_ctf_pointer_type)
(gen_ctf_subrange_type, gen_ctf_array_type, gen_ctf_typedef)
(gen_ctf_modifier_type, gen_ctf_sou_type, gen_ctf_function_type)
(gen_ctf_enumeration_type, gen_ctf_variable, gen_ctf_function)
(gen_ctf_type, ctf_do_die): Likewise.
* config/bpf/btfext-out.cc (struct btf_ext_core_reloc): Use
pointer instead of type ID.
(bpf_core_reloc_add, bpf_core_get_sou_member_index)
(output_btfext_core_sections): Adapt to above changes.
* config/bpf/core-builtins.cc (process_type): Likewise.
include/
* btf.h (BTF_VOID_TYPEID, BTF_INIT_TYPEID): Move defines here,
from gcc/btfout.cc.
David Faust [Thu, 30 May 2024 21:06:27 +0000 (14:06 -0700)]
ctf, btf: restructure CTF/BTF emission
This commit makes some structural changes to the CTF/BTF debug info
emission. In particular:
a) CTF is new always fully generated and emitted before any
BTF-related procedures are run. This means that BTF-related
functions can change, even irreversibly, the shared in-memory
representation used by the two formats without issue.
b) BTF generation has fewer entry points, and is cleanly divided
into early_finish and finish.
c) BTF is now always emitted at finish (called from dwarf2out_finish),
for all targets in non-LTO builds, rather than being emitted at
early_finish for targets other than BPF CO-RE. In LTO builds,
BTF is emitted at early_finish as before.
Note that this change alone does not alter the contents of BTF at
all, regardless of whether it would have previously been emitted at
early_finish or finish, because the calculation of the BTF to be
emitted is not moved by this patch, only the write-out.
The changes are transparent to both CTF and BTF emission.
gcc/
* btfout.cc (btf_init_postprocess): Rename to...
(btf_early_finish): ...this.
(btf_output): Rename to...
(btf_finish): ...this.
* ctfc.h: Analogous changes.
* dwarf2ctf.cc (ctf_debug_early_finish): Conditionally call
btf_early_finish, or ctf_finalize as appropriate. Emit BTF
here for LTO builds.
(ctf_debug_finish): Always call btf_finish here if generating
BTF info in non-LTO builds.
(ctf_debug_finalize, ctf_debug_init_postprocess): Delete.
* dwarf2out.cc (dwarf2out_early_finish): Remove call to
ctf_debug_init_postprocess.
Arm: Fix disassembly error in Thumb-1 relaxed load/store [PR115188]
A Thumb-1 memory operand allows single-register LDMIA/STMIA. This doesn't get
printed as LDR/STR with writeback in unified syntax, resulting in strange
assembler errors if writeback is selected. To work around this, use the 'Uw'
constraint that blocks writeback. Also use a new 'mem_and_no_t1_wback_op'
which is a general memory operand that disallows writeback in Thumb-1.
A few other patterns were using 'm' for Thumb-1 in a similar way, update these
to also use 'mem_and_no_t1_wback_op' and 'Uw'.
gcc:
PR target/115188
* config/arm/arm.md (unaligned_loadsi): Use 'Uw' constraint and
'mem_and_no_t1_wback_op'.
(unaligned_loadhiu): Likewise.
(unaligned_storesi): Likewise.
(unaligned_storehi): Likewise.
* config/arm/predicates.md (mem_and_no_t1_wback_op): Add new predicate.
* config/arm/sync.md (arm_atomic_load<mode>): Use 'Uw' constraint.
(arm_atomic_store<mode>): Likewise.
gcc/testsuite:
PR target/115188
* gcc.target/arm/pr115188.c: Add new test.
Lewis Hyatt [Thu, 27 Jun 2024 20:11:27 +0000 (16:11 -0400)]
build: Fix "make install" for MinGW
Since r8-4925, the "make install" recipe generates a path which can start
with "//", causing problems for some Windows environments. Fix by removing
the redundant slash.
The `function_attribute_inlinable_p` hook documentation described it
returning the value if it is OK to inline the provided fndecl into "the
current function". AFAICS This hook is only called when
`current_function_decl` is the same as the `fndecl` argument that the
hook is given, hence asking whether `fndecl` can be inlined into "the
current function" doesn't seem relevant. Moreover from what I see no
existing implementation of `function_attribute_inlinable_p` uses "the
current function" in any way.
Update the documentation to match this understanding.
The `unspec_may_trap_p` documentation mentioned applying to either
`unspec` or `unspec_volatile`. AFAICS this hook is only used for
`unspec` codes since c84a808e493a, so I removed the mention of
`unspec_volatile`.
Eric Botcazou [Wed, 19 Jun 2024 20:45:29 +0000 (22:45 +0200)]
ada: Use static allocation for small dynamic string concatenations in more cases
This lifts the limitation of the original implementation whereby the first
operand of the concatenation needs to have a length known at compiled time
in order for the static allocation to be used.
gcc/ada/
* exp_ch4.adb (Expand_Concatenate): In the case where an operand
does not have both bounds known at compile time, use nevertheless
the low bound directly if it is known at compile time.
Fold the conditional expression giving the low bound of the result
in the general case if the low bound of all the operands are equal.
Steve Baird [Thu, 13 Jun 2024 22:28:29 +0000 (15:28 -0700)]
ada: Use clause (or use type clause) in a protected operation sometimes ignored.
In some cases, a use clause (or a use type clause) occurring within a
protected operation is incorrectly ignored.
gcc/ada/
* exp_ch9.adb
(Expand_N_Protected_Body): Declare new procedure
Unanalyze_Use_Clauses and call it before analyzing the newly
constructed subprogram body.
Steve Baird [Thu, 13 Jun 2024 22:39:37 +0000 (15:39 -0700)]
ada: Put_Image aspect spec ignored for null extension.
If type T1 is is a tagged null record with a Put_Image aspect specification
and type T2 is a null extension of T1 (with no aspect specifications), then
evaluation of a T2'Image call should include a call to the specified procedure
(as opposed to yielding "(NULL RECORD)").
gcc/ada/
* exp_put_image.adb
(Build_Record_Put_Image_Procedure): Declare new Boolean-valued
function Null_Record_Default_Implementation_OK; call it as part of
deciding whether to generate "(NULL RECORD)" text.
Justin Squirek [Tue, 18 Jun 2024 08:38:18 +0000 (08:38 +0000)]
ada: Allow mutably tagged types to work with qualified expressions
This patch modifies the experimental 'Size'Class feature such that objects of
mutably tagged types can be assigned qualified expressions featuring a
definite type (e.g. Mutable_Obj := Root_Child_T'(Root_T with others => <>)).
gcc/ada/
* sem_ch5.adb:
(Analyze_Assignment): Add special expansion for qualified expressions
in certain cases dealing with mutably tagged types.
Bob Duff [Tue, 18 Jun 2024 16:53:46 +0000 (12:53 -0400)]
ada: Bug box for expression function with list comprehension
GNAT crashes on an iterator with a filter inside an expression function
that is the completion of an earlier spec.
gcc/ada/
* freeze.adb (Freeze_Type_Refs): If Node is in N_Has_Etype,
check that it has had its Etype set, because this can be
called early for expression functions that are completions.
Eric Botcazou [Mon, 17 Jun 2024 07:54:47 +0000 (09:54 +0200)]
ada: Call memcmp instead of Compare_Array_Unsigned_8 and...
... implement support for ordering comparisons of discrete array types.
This extends the Support_Composite_Compare_On_Target feature to ordering
comparisons of discrete array types as specified by RM 4.5.2(26/3), when
the component type is a byte (unsigned).
Implement support for ordering comparisons of discrete array types
with a two-pronged approach: for types with a size known at compile time,
this lets the gimplifier generate the call to memcmp (or else an optimize
version of it); otherwise, this directly generates the call to memcmp.
gcc/ada/
* exp_ch4.adb (Expand_Array_Comparison): Remove the obsolete byte
addressibility test. If Support_Composite_Compare_On_Target is true,
immediately return for a component size of 8, an unsigned component
type and aligned operands. Disable when Unnest_Subprogram_Mode is
true (for LLVM).
(Expand_N_Op_Eq): Adjust comment.
* targparm.ads (Support_Composite_Compare_On_Target): Replace bit by
byte in description and document support for ordering comparisons.
* gcc-interface/utils2.cc (compare_arrays): Rename into...
(compare_arrays_for_equality): ...this. Remove redundant lines.
(compare_arrays_for_ordering): New function.
(build_binary_op) <comparisons>: Call compare_arrays_for_ordering
to implement ordering comparisons for arrays.
Yannick Moy [Mon, 17 Jun 2024 09:57:55 +0000 (11:57 +0200)]
ada: Fix analysis of Extensions_Visible
Pragma/aspect Extensions_Visible should be analyzed before any
pre/post contracts on a subprogram, as the legality of conversions
of formal parameters to classwide type depends on the value of
Extensions_Visible. Now fixed.
gcc/ada/
* contracts.adb (Analyze_Pragmas_In_Declarations): Analyze
pragmas in two iterations over the list of declarations in
order to analyze some pragmas before others.
* einfo-utils.ads (Get_Pragma): Fix comment.
* sem_prag.ads (Pragma_Significant_To_Subprograms): Fix.
(Pragma_Significant_To_Subprograms_Analyzed_First): Add new
global array to identify these pragmas which should be analyzed
first, which concerns only Extensions_Visible for now.
Eric Botcazou [Mon, 17 Jun 2024 19:22:06 +0000 (21:22 +0200)]
ada: Fix bogus error on allocator in instantiation with private derived types
The problem is that the call to Convert_View made from Make_Init_Call does
nothing because the Etype is not set on the second argument.
gcc/ada/
* exp_ch7.adb (Convert_View): Add third parameter Typ and use it if
the second parameter does not have an Etype.
(Make_Adjust_Call): Remove obsolete setting of Etype and pass Typ in
call to Convert_View.
(Make_Final_Call): Likewise.
(Make_Init_Call): Pass Typ in call to Convert_View.
Javier Miranda [Sun, 16 Jun 2024 18:41:57 +0000 (18:41 +0000)]
ada: Miscomputed bounds for inner null array aggregates
When an array has several dimensions, and inner dimmensions are
initialized using Ada 2022 null array aggregates, the compiler
crashes or reports spurious errors computing the bounds of the
null array aggregates. This patch fixes the problem and adds
new warnings reported when the index of null array aggregates is
an enumeration type or a modular type and it is known at compile
time that the program will raise Constraint_Error computing the
bounds of the aggregate.
gcc/ada/
* sem_aggr.adb (Cannot_Compute_High_Bound): New subprogram.
(Report_Null_Array_Constraint_Error): New subprogram.
(Collect_Aggr_Bounds): For null aggregates, build the bounds
of the inner dimensions.
(Has_Null_Aggregate_Raising_Constraint_Error): New subprogram.
(Subtract): New subprogram.
(Resolve_Array_Aggregate): Report a warning when the index of
null array aggregates is an enumeration type or a modular type
at we can statically determine that the program will raise CE
at runtime computing its high bound.
(Resolve_Null_Array_Aggregate): ditto.
Eric Botcazou [Tue, 11 Jun 2024 21:06:22 +0000 (23:06 +0200)]
ada: Fix crash on box-initialized component with No_Default_Initialization
The problem is that the implementation of the No_Default_Initialization
restriction assumes that no type initialization routines are needed and,
therefore, builds a dummy version of them, which goes against their use
for box-initialized components in aggregates.
Therefore this use needs to be flagged as violating the restriction too.
gcc/ada/
* doc/gnat_rm/standard_and_implementation_defined_restrictions.rst
(No_Default_Initialization): Mention components alongside variables.
* exp_aggr.adb (Build_Array_Aggr_Code.Gen_Assign): Check that the
restriction No_Default_Initialization is not in effect for default
initialized component.
(Build_Record_Aggr_Code): Likewise.
* gnat_rm.texi: Regenerate.
Andrew Stubbs [Fri, 28 Jun 2024 15:13:59 +0000 (15:13 +0000)]
amdgcn: invent target feature flags
This is a first step towards having a device table so we can add new devices
more easily. It'll also make it easier to remove the deprecated GCN3 bits.
The patch should not change the behaviour of anything.
Kewen Lin [Tue, 2 Jul 2024 08:58:06 +0000 (03:58 -0500)]
sparc: define SPARC_LONG_DOUBLE_TYPE_SIZE for vxworks [PR115739]
Commit r15-1594 removed define of LONG_DOUBLE_TYPE_SIZE in
sparc.cc, it's based on the assumption that each OS has its
own define (see the comments in sparc.h), but it exposes an
issue on vxworks which lacks of the define.
We can bring back the default SPARC_LONG_DOUBLE_TYPE_SIZE to
sparc.cc, but according to the comments in sparc.h, I think
it's better to define this in vxworks.h. btw, I also went
through all the sparc supported triples, vxworks is the only
one that misses this define.
PR target/115739
gcc/ChangeLog:
* config/sparc/vxworks.h (SPARC_LONG_DOUBLE_TYPE_SIZE): New define.
After r15-1579, ADD and LD/ST pairs will be merged into LDX/STX.
Cause these two tests to fail. To guarantee that these two tests pass,
add the compilation option '-fno-late-combine-instructions'.
Kewen Lin [Tue, 2 Jul 2024 07:13:35 +0000 (02:13 -0500)]
isel: Fold more in gimple_expand_vec_cond_expr [PR115659]
As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? -1/z : z/0.
For r = c ? -1 : z, it can be folded into:
- r = c | z (with ior_optab supported)
- or r = c ? c : z
while for r = c ? z : 0, it can be foled into:
- r = c & z (with and_optab supported)
- or r = c ? z : c
This patch is to teach ISEL to take care of them and also
remove the redundant gsi_replace as the caller of function
gimple_expand_vec_cond_expr will handle it.
PR tree-optimization/115659
gcc/ChangeLog:
* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
patterns x CMP y ? -1 : z and x CMP y ? z : 0.
Marek Polacek [Wed, 26 Jun 2024 21:55:21 +0000 (17:55 -0400)]
c++: ICE with computed gotos [PR115469]
This is a low-prio crash on invalid code where we ICE on a VAR_DECL
with erroneous type. I thought I'd try to avoid putting such decls
into ->names and ->names_in_scope but that sounds riskier than the
following cleanup.
PR c++/115469
gcc/cp/ChangeLog:
* decl.cc (automatic_var_with_nontrivial_dtor_p): New.
(poplevel_named_label_1): Use it.
(check_goto_1): Likewise.
Marek Polacek [Tue, 25 Jun 2024 21:42:01 +0000 (17:42 -0400)]
c++: unresolved overload with comma op [PR115430]
This works:
template<typename T>
int Func(T);
typedef int (*funcptrtype)(int);
funcptrtype fp0 = &Func<int>;
but this doesn't:
funcptrtype fp2 = (0, &Func<int>);
because we only call resolve_nondeduced_context on the LHS (via
convert_to_void) but not on the RHS, so cp_build_compound_expr's
type_unknown_p check issues an error.
PR c++/115430
gcc/cp/ChangeLog:
* typeck.cc (cp_build_compound_expr): Call resolve_nondeduced_context
on RHS.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/noexcept41.C: Remove dg-error.
* g++.dg/overload/addr3.C: New test.
Marek Polacek [Fri, 28 Jun 2024 21:51:19 +0000 (17:51 -0400)]
c++: DR2627, Bit-fields and narrowing conversions [PR94058]
This DR (https://cplusplus.github.io/CWG/issues/2627.html) says that
even if we are converting from an integer type or unscoped enumeration type
to an integer type that cannot represent all the values of the original
type, it's not narrowing if "the source is a bit-field whose width w is
less than that of its type (or, for an enumeration type, its underlying
type) and the target type can represent all the values of a hypothetical
extended integer type with width w and with the same signedness as the
original type".
DR 2627
PR c++/94058
PR c++/104392
gcc/cp/ChangeLog:
* typeck2.cc (check_narrowing): Don't warn if the conversion isn't
narrowing as per DR 2627.
gcc/testsuite/ChangeLog:
* g++.dg/DRs/dr2627.C: New test.
* g++.dg/cpp0x/Wnarrowing22.C: New test.
* g++.dg/cpp2a/spaceship-narrowing1.C: New test.
* g++.dg/cpp2a/spaceship-narrowing2.C: New test.
Richard Biener [Sun, 30 Jun 2024 09:37:12 +0000 (11:37 +0200)]
Preserve SSA info for more propagated copy
Besides VN and copy-prop also CCP and VRP as well as forwprop
propagate out copies and thus it's worthwhile to try to preserve
range and points-to info there when possible.
Note that this also fixes the testcase from PR115701 but that's
because we do not actually intersect info but only copy info when
there was no info present.
Pan Li [Sun, 30 Jun 2024 08:48:19 +0000 (16:48 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 4
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 4. Aka:
Form 4:
#define DEF_SAT_U_ADD_IMM_FMT_4(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_4 (T x) \
{ \
T ret; \
return __builtin_add_overflow (x, 9, &ret) == 0 ? ret : -1; \
}
DEF_SAT_U_ADD_IMM_FMT_4(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-16.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-16.c: New test.
Pan Li [Sun, 30 Jun 2024 08:41:16 +0000 (16:41 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 3
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 3. Aka:
Form 3:
#define DEF_SAT_U_ADD_IMM_FMT_3(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_3 (T x) \
{ \
T ret; \
return __builtin_add_overflow (x, 8, &ret) ? -1 : ret; \
}
DEF_SAT_U_ADD_IMM_FMT_3(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-9.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-9.c: New test.
Pan Li [Sun, 30 Jun 2024 08:14:38 +0000 (16:14 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 2
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 2. Aka:
Form 2:
#define DEF_SAT_U_ADD_IMM_FMT_2(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{ \
return (T)(x + 9) < x ? -1 : (x + 9); \
}
DEF_SAT_U_ADD_IMM_FMT_2(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-8.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-8.c: New test.
Pan Li [Sun, 30 Jun 2024 08:03:41 +0000 (16:03 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 1
This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 1. Aka:
Form 1:
#define DEF_SAT_U_ADD_IMM_FMT_1(T) \
T __attribute__((noinline)) \
sat_u_add_imm_##T##_fmt_1 (T x) \
{ \
return (T)(x + 9) >= x ? (x + 9) : -1; \
}
DEF_SAT_U_ADD_IMM_FMT_1(uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-4.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-4.c: New test.
Roger Sayle [Mon, 1 Jul 2024 11:21:20 +0000 (12:21 +0100)]
testsuite: Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat.
This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c
with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this
AVX512 test includes the system math.h, and on older systems this provides
inline versions of floor, ceil and rint (for the 387). The work around
is to define __NO_MATH_INLINES before #include <math.h> (or alternatively
use __builtin_floor, __builtin_ceil, etc.).
2024-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/testsuite/ChangeLog
PR middle-end/102464
* gcc.target/i386/pr102464-vrndscaleph.c: Define __NO_MATH_INLINES
to resovle FAILs with -m32 on older RedHat systems.
Roger Sayle [Mon, 1 Jul 2024 11:18:26 +0000 (12:18 +0100)]
i386: Additional peephole2 to use lea in round-up integer division.
A common idiom for implementing an integer division that rounds upwards is
to write (x + y - 1) / y. Conveniently on x86, the two additions to form
the numerator can be performed by a single lea instruction, and indeed gcc
currently generates a lea when both x and y are both registers.
This discrepancy is caused by the late decision (in peephole2) to split
an addition with a memory operand, into a load followed by a reg-reg
addition. This patch improves this situation by adding a peephole2
to recognize consecutive additions and transform them into lea if
profitable.
My first attempt at fixing this was to use a define_insn_and_split:
using combine to combine instructions. Unfortunately, this approach
interferes with (reload's) subtle balance of deciding when to use/avoid lea,
which can be observed as a code size regression in CSiBE. The peephole2
approach (proposed here) uniformly improves CSiBE results.
2024-07-01 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (peephole2): Transform two consecutive
additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR.
gcc/testsuite/ChangeLog
* gcc.target/i386/lea-3.c: New test case.
PR target/88236
PR target/115726
gcc/
* config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a
way that the destination does not overlap with any hard register
clobbered / used by xload8qi_A resp. xload<mode>_A.
* config/avr/avr.cc (avr_out_xload): Avoid early-clobber
situation for Z by executing just one load when the output register
overlaps with Z.
gcc/testsuite/
* gcc.target/avr/torture/pr88236-pr115726.c: New test.
Andrew Stubbs [Wed, 12 Jun 2024 11:09:33 +0000 (11:09 +0000)]
libgomp, openmp: Add ompx_gnu_pinned_mem_alloc
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait. One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.
gcc/fortran/ChangeLog:
* openmp.cc (is_predefined_allocator): Update valid ranges to
incorporate ompx_gnu_pinned_mem_alloc.
libgomp/ChangeLog:
* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-pinned-1.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Andrew Stubbs [Wed, 12 Jun 2024 08:43:53 +0000 (08:43 +0000)]
libgomp: change alloc-pinned tests failure mode
The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.
On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.
libgomp/ChangeLog:
* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
Richard Biener [Mon, 1 Jul 2024 08:06:55 +0000 (10:06 +0200)]
tree-optimization/115723 - ICE with .COND_ADD reduction
The following fixes an ICE with a .COND_ADD discovered as reduction
even though its else value isn't the reduction chain link but a
constant. This would be wrong-code with --disable-checking I think.
PR tree-optimization/115723
* tree-vect-loop.cc (check_reduction_path): For a .COND_ADD
verify the else value also refers to the reduction chain op.
liuhongt [Thu, 20 Jun 2024 04:41:13 +0000 (12:41 +0800)]
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x < 0 ? 1 : 0 into (unsigned) x >> 31.
Add define_insn_and_split for the optimization did in
ix86_expand_int_vcond.
gcc/ChangeLog:
PR target/115517
* config/i386/sse.md ("*ashr<mode>3_1"): New
define_insn_and_split.
(*avx512_ashr<mode>3_1): Ditto.
(*avx2_lshr<mode>3_1): Ditto.
(*avx2_lshr<mode>3_2): Ditto and add 2 combine splitter after
it.
* config/i386/mmx.md (mmxscalarsize): New mode attribute.
(*mmw_ashr<mode>3_1): New define_insn_and_split.
("mmx_<insn><mode>3): Add a combine spiltter after it.
(*mmx_ashrv2hi3_1): New define_insn_and_plit, also add a
combine splitter after it.
liuhongt [Wed, 19 Jun 2024 08:05:58 +0000 (16:05 +0800)]
Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}.
> Richard suggests that we implement the "obvious" transforms like
> inversion in the middle-end but if for example unsigned compares
> are not supported the us_minus + eq + negative trick isn't on
> that list.
>
> The main reason to restrict vec_cmp would be to avoid
> a <= b ? c : d going with an unsupported vec_cmp but instead
> do a > b ? d : c - the alternative is trying to fix this
> on the RTL side via combine. I understand the non-native
Yes, I have a patch which can fix most regressions via pattern match
in combine.
Still there is a situation that is difficult to deal with, mainly the
optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only
exists under sse4.1, w/o sse4.1, it takes 3
instructions (pand,pandn,por) to simulate the vcond_mask, and the
combine matches up to 4 instructions, which makes it currently
impossible to use the combine to recover those optimizations in the
vcond{,u,eq}.i.e min/max.
In the case of sse 4.1 and above, there is basically no regression anymore.
liuhongt [Wed, 26 Jun 2024 05:52:24 +0000 (13:52 +0800)]
Enable flate-combine.
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
define target_insn_cost to prevent post_reload pass_late_combine to
revert the optimziation did in pass_rpad.
Adjust testcases since pass_late_combine generates better code but
break scan assembly.
.i.e
Under 32-bit target, gcc used to generate broadcast from stack and
then do the real operation.
After flate_combine, they're combined into embeded broadcast
operations.
gcc/ChangeLog:
* config/i386/i386-features.cc (ix86_rpad_gate): New function.
* config/i386/i386-options.cc (ix86_override_options_after_change):
Don't disable flate_combine.
* config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
after pre_reload pas_late_combine.
* config/i386/i386-protos.h (ix86_rpad_gate): New declare.
* config/i386/i386.cc (ix86_insn_cost): New function.
(TARGET_INSN_COST): Define.
liuhongt [Wed, 26 Jun 2024 05:07:31 +0000 (13:07 +0800)]
Extend lshifrtsi3_1_zext to ?k alternative.
late_combine will combine lshift + zero into *lshifrtsi3_1_zext which
cause extra mov between gpr and kmask, add ?k to the pattern.
gcc/ChangeLog:
PR target/115610
* config/i386/i386.md (<*insnsi3_zext): Add alternative ?k,
enable it only for lshiftrt and under avx512bw.
* config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and
add corresponding define_split after it.
liuhongt [Wed, 26 Jun 2024 03:17:46 +0000 (11:17 +0800)]
Define mask as extern instead of uninitialized local variables.
The testcases are supposed to scan for vpopcnt{b,w,d,q} operations
with k mask, but mask is defined as uninitialized local variable which
will be set as 0 at rtl expand phase.
And it's further simplified off by late_combine which caused scan assembly failure.
Move the definition of mask outside to make the testcases more stable.
gcc/testsuite/ChangeLog:
PR target/115610
* gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as
extern instead of uninitialized local variables.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.
Richard Biener [Thu, 27 Jun 2024 09:36:07 +0000 (11:36 +0200)]
Harden SLP reduction support wrt STMT_VINFO_REDUC_IDX
The following makes sure that for a SLP reductions all lanes have
the same STMT_VINFO_REDUC_IDX. Once we move that info and can adjust
it we can implement swapping. It also makes the existing protection
against operand swapping trigger for all stmts participating in a
reduction, not just the final one marked as reduction-def.
* tree-vect-slp.cc (vect_build_slp_tree_1): Compare
STMT_VINFO_REDUC_IDX.
(vect_build_slp_tree_2): Prevent operand swapping for
all stmts participating in a reduction.
Feng Xue [Sun, 16 Jun 2024 05:00:32 +0000 (13:00 +0800)]
vect: Determine input vectype for multiple lane-reducing operations
The input vectype of reduction PHI statement must be determined before
vect cost computation for the reduction. Since lance-reducing operation has
different input vectype from normal one, so we need to traverse all reduction
statements to find out the input vectype with the least lanes, and set that to
the PHI statement.
2024-06-16 Feng Xue <fxue@os.amperecomputing.com>
gcc/
* tree-vect-loop.cc (vectorizable_reduction): Determine input vectype
during traversal of reduction statements.
[PR115565] cse: Don't use a valid regno for non-register in comparison_qty
Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not
with a register, because the value of -1 is actually a valid reference
to register 0 in the case where it has not been assigned a quantity.
Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty'
comparison in `fold_rtx' to incorrectly trigger in rare circumstances
and return true for a memory reference, making CSE consider a comparison
operation to evaluate to a constant expression and consequently make the
resulting code incorrectly execute or fail to execute conditional
blocks.
This has caused a miscompilation of rwlock.c from LinuxThreads for the
`alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()'
expression (where `thread_self' returns the thread pointer via a PALcode
call) has been decided to be always true (with `ent->comparison_qty'
using -1 for a reference to to `rwlock->__rw_writer', while register 0
holding the thread pointer retrieved by `thread_self') and code for the
false case has been optimized away where it mustn't have, causing
program lockups.
The issue has been observed as a regression from commit 08a692679fb8
("Undefined cse.c behaviour causes 3.4 regression on HPUX"),
<https://gcc.gnu.org/ml/gcc-patches/2004-10/msg02027.html>, and up to
commit 932ad4d9b550 ("Make CSE path following use the CFG"),
<https://gcc.gnu.org/ml/gcc-patches/2006-12/msg00431.html>, where CSE
has been restructured sufficiently for the issue not to trigger with the
original reproducer anymore. However the original bug remains and can
trigger, because `comparison_qty' will still be assigned -1 for a memory
reference and the `reg_qty' member of a `cse_reg_info_table' entry will
still be assigned -1 for register 0 where the entry has not been
assigned a quantity, e.g. at initialization.
Use INT_MIN then as noted above, so that the value remains negative, for
consistency with the REGNO_QTY_VALID_P macro (even though not used on
`comparison_qty'), and then so that it should not ever match a valid
negated register number, fixing the regression with commit 08a692679fb8.
gcc/
PR rtl-optimization/115565
* cse.cc (record_jump_cond): Use INT_MIN rather than -1 for
`comparison_qty' if !REG_P.
Sergei Lewis [Sat, 29 Jun 2024 20:34:31 +0000 (14:34 -0600)]
[to-be-committed,RISC-V,V4] movmem for RISCV with V extension
I hadn't updated my repo on the host where I handle email, so it picked
up the older version of this patch without the testsuite fix. So, V4
with the testsuite option for lmul fixed.
--
And Sergei's movmem patch. Just trivial testsuite adjustment for an
option name change and a whitespace fix from me.
I've spun this in my tester for rv32 and rv64. I'll wait for pre-commit
CI before taking further action.
Just a reminder, this patch is designed to handle the case where we can
issue a single vector load/store which avoids all the complexities of
determining which direction to copy.
--
gcc/ChangeLog
* config/riscv/riscv.md (movmem<mode>): New expander.
gcc/testsuite/ChangeLog
PR target/112109
* gcc.target/riscv/rvv/base/movmem-1.c: New test
The below test suites are passed for this patch:
1. The rv64gcv fully regression test with newlib.
2. The x86 bootstrap test.
3. The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Add imm form for .SAT_ADD matching.
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
Add .SAT_ADD matching under PLUS_EXPR.
Iain Sandoe [Sat, 29 Jun 2024 02:10:59 +0000 (03:10 +0100)]
jit: Fix Darwin bootstrap after r15-1699.
r15-1699-g445c62ee492 contains changes that trigger two maybe-uninitialized
warnings on Darwin, which result in a bootstrap failure.
Note that the warnings are false positives, in fact the variables should be
initialized in the cases of a switch (all values of the switch condition are
covered).
Fixed here by providing default initializations for the relevant variables.
gcc/jit/ChangeLog:
* jit-recording.cc
(recording::memento_of_typeinfo::make_debug_string): Default the value
of ident.
(recording::memento_of_typeinfo::write_reproducer): Default the value
of type.
Jeff Law [Sat, 29 Jun 2024 00:36:50 +0000 (18:36 -0600)]
[committed] Fix mcore-elf regression after recent IRA change
So the recent IRA change exposed a bug in the mcore backend.
The mcore has a special instruction (xtrb3) which can zero extend a GPR into
R1. It's useful because zextb requires a matching source/destination.
Unfortunately xtrb3 modifies CC.
The IRA changes twiddle register allocation such that we want to use xtrb3.
Unfortunately CC is live at the point where we want to use xtrb3 and clobbering
CC causes the test to fail.
Exposing the clobber in the expander and insn seems like the best path forward.
We could also drop the xtrb3 alternative, but that seems like it would hurt
codegen more than exposing the clobber.
The bitfield extraction patterns using xtrb look problematic as well, but I
didn't try to fix those.
This fixes the builtn-arith-overflow regressions and appears to fix 20010122-1.c as a side effect.
gcc/
* config/mcore/mcore.md (zero_extendqihi2): Clobber CC in expander
and matching insn.
(zero_extendqisi2): Likewise.
Patrick Palka [Fri, 28 Jun 2024 23:45:21 +0000 (19:45 -0400)]
c++: bad 'this' conversion for nullary memfn [PR106760]
Here we notice the 'this' conversion for the call f<void>() is bad, so
we correctly defer deduction for the template candidate, but we end up
never adding it to 'bad_cands' since missing_conversion_p for it returns
false (its only argument is 'this' which has already been determined to
be bad). This is not a huge deal, but it causes us to longer accept the
call with -fpermissive in release builds, and a tree check ICE in checking
builds.
So if we have a non-strictly viable template candidate that has not been
instantiated, then we need to add it to 'bad_cands' even if no argument
conversion is missing.
PR c++/106760
gcc/cp/ChangeLog:
* call.cc (add_candidates): Relax test for adding a candidate
to 'bad_cands' to also accept an uninstantiated template candidate
that has no missing conversions.
Jonathan Wakely [Fri, 28 Jun 2024 14:14:15 +0000 (15:14 +0100)]
libstdc++: Define __glibcxx_assert_fail for non-verbose build [PR115585]
When the library is configured with --disable-libstdcxx-verbose the
assertions just abort instead of calling __glibcxx_assert_fail, and so I
didn't export that function for the non-verbose build. However, that
option is documented to not change the library ABI, so we still need to
export the symbol from the library. It could be needed by programs
compiled against the headers from a verbose build.
The non-verbose definition can just call abort so that it doesn't pull
in I/O symbols, which are unwanted in a non-verbose build.
libstdc++-v3/ChangeLog:
PR libstdc++/115585
* src/c++11/assert_fail.cc (__glibcxx_assert_fail): Add
definition for non-verbose builds.
Jonathan Wakely [Fri, 28 Jun 2024 10:14:39 +0000 (11:14 +0100)]
libstdc++: Extend std::equal memcmp optimization to std::byte [PR101485]
We optimize std::equal to memcmp for integers and pointers, which means
that std::byte comparisons generate bigger code than char comparisons.
We can't use memcmp for arbitrary enum types, because they could have an
overloaded operator== that has custom semantics, but we know that
std::byte doesn't do that.
libstdc++-v3/ChangeLog:
PR libstdc++/101485
* include/bits/stl_algobase.h (__equal_aux1): Check for
std::byte as well.
* testsuite/25_algorithms/equal/101485.cc: New test.
Jonathan Wakely [Wed, 26 Jun 2024 13:09:07 +0000 (14:09 +0100)]
libstdc++: Do not use C++11 alignof in C++98 mode [PR104395]
When -faligned-new (or Clang's -faligned-allocation) is used our
allocators try to support extended alignments, gated on the
__cpp_aligned_new macro. However, because they use alignof(_Tp) which is
not a keyword in C++98 mode, using -std=c++98 -faligned-new results in
errors from <memory> and other headers.
We could change them to use __alignof__ instead of alignof, but that
would potentially alter the result of the conditions, because e.g.
alignof(long long) != __alignof__(long long) on some targets. That's
probably not an issue for any types with extended alignment, so maybe it
would be a safe change.
For now, it seems acceptable to just disable the extended alignment
support in C++98 mode, so that -faligned-new enables std::align_val_t
and the corresponding operator new overloads, but doesn't affect
std::allocator, __gnu_cxx::__bitmap_allocator etc.
libstdc++-v3/ChangeLog:
PR libstdc++/104395
* include/bits/new_allocator.h: Disable extended alignment
support in C++98 mode.
* include/bits/stl_tempbuf.h: Likewise.
* include/ext/bitmap_allocator.h: Likewise.
* include/ext/malloc_allocator.h: Likewise.
* include/ext/mt_allocator.h: Likewise.
* include/ext/pool_allocator.h: Likewise.
* testsuite/ext/104395.cc: New test.
Jonathan Wakely [Wed, 26 Jun 2024 11:40:51 +0000 (12:40 +0100)]
libstdc++: Simplify <ext/aligned_buffer.h> class templates
As noted in a comment, the __gnu_cxx::__aligned_membuf class template
can be simplified, because alignof(T) and alignas(T) use the correct
alignment for a data member. That's true since GCC 8 and Clang 8. The
EDG front end (as used by Intel icc, aka "Intel C++ Compiler Classic")
does not implement the PR c++/69560 change, so keep using the old
implementation when __EDG__ is defined, to avoid an ABI change for icc.
For __gnu_cxx::__aligned_buffer<T> all supported compilers agree on the
value of __alignof__(T), but we can still simplify it by removing the
dependency on std::aligned_storage<sizeof(T), __alignof__(T)>.
Add a test that checks that the aligned buffer types have the expected
alignment, so that we can tell if changes like this affect their ABI
properties.
libstdc++-v3/ChangeLog:
* include/ext/aligned_buffer.h (__aligned_membuf): Use
alignas(T) directly instead of defining a struct and using 9its
alignment.
(__aligned_buffer): Remove use of std::aligned_storage.
* testsuite/abi/aligned_buffers.cc: New test.
Uros Bizjak [Fri, 28 Jun 2024 15:49:43 +0000 (17:49 +0200)]
i386: Cleanup tmp variable usage in ix86_expand_move
Remove extra assignment, extra temp variable and variable shadowing.
No functional changes intended.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_move): Remove extra
assignment to tmp variable, reuse tmp variable instead of
declaring new temporary variable and remove tmp variable shadowing.
Jørgen Kvalsvik [Fri, 28 Jun 2024 06:35:31 +0000 (08:35 +0200)]
Use move-aware auto_vec in map
Using auto_vec rather than vec for means the vectors are release
automatically upon return, to stop the leak. The problem seems is that
auto_vec<T, N> is not really move-aware, only the <T, 0> specialization
is.
gcc/ChangeLog:
* tree-profile.cc (find_conditions): Use auto_vec without
embedded storage.
Richard Biener [Fri, 28 Jun 2024 11:29:21 +0000 (13:29 +0200)]
tree-optimization/115652 - more fixing of the fix
The following addresses the corner case of an outer loop with an empty
header where we end up asking for the BB of a NULL stmt by
special-casing this case.
PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Handle the case
where the outer loop header block is empty.
Evgeny Karpov [Fri, 28 Jun 2024 12:37:12 +0000 (12:37 +0000)]
i386: Fix regression after refactoring legitimize_pe_coff_symbol, ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL [PR115635]
This patch fixes 3 bugs reported after merging the "Add DLL
import/export implementation to AArch64" series.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653955.html The
series refactors the i386 codebase to reuse it in AArch64, which
triggers some bugs.
Bug 115661 - [15 Regression] wrong code at -O{2,3} on x86_64-linux-gnu
since r15-1599-g63512c72df09b4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115661
Bug 115635 - [15 regression] Bootstrap fails with failed self-test
with the rust fe (diagnostic-path.cc:1153: test_empty_path: FAIL:
ASSERT_FALSE ((path.interprocedural_p ()))) since r15-1599-g63512c72df09b4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115635
Issue 1. In some code, i386 has been relying on the
legitimize_pe_coff_symbol call on all platforms and should return
NULL_RTX if it is not supported.
Fix: NULL_RTX handling has been added when the target does not support
PECOFF.
Issue 2. ix86_GOT_alias_set is used on all platforms and cannot be
extracted to mingw.
Fix: ix86_GOT_alias_set has been returned as it was and is used on all
platforms for i386.
Bug 115643 - [15 regression] aarch64-w64-mingw32 support today breaks
x86_64-w64-mingw32 build cannot represent relocation type BFD_RELOC_64
since r15-1602-ged20feebd9ea31
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115643
Issue 3. PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been added and
used with a negative operator for a complex expression without braces.
Fix: Braces has been added, and
PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been renamed to
PE_COFF_LEGITIMIZE_EXTERN_DECL.
Richard Biener [Wed, 26 Jun 2024 12:07:51 +0000 (14:07 +0200)]
tree-optimization/115640 - outer loop vect with inner SLP permute
The following fixes wrong-code when using outer loop vectorization
and an inner loop SLP access with permutation. A wrong adjustment
to the IV increment is then applied on GCN.
PR tree-optimization/115640
* tree-vect-stmts.cc (vectorizable_load): With an inner
loop SLP access to not apply a gap adjustment.
First step to adding a general assign all class type's data members
routine. Having a general routine prevents forgetting to tackle the
edge cases, e.g. setting _len.
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_class_set_vptr): Add setting of _vptr
member.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): First use
of gfc_class_set_vptr and refactor very similar code.
* trans.h (gfc_class_set_vptr): Declare the new function.
gcc/testsuite/ChangeLog:
* gfortran.dg/unlimited_polymorphic_11.f90: Remove unnecessary
casts in gd-final expression.
Roger Sayle [Fri, 28 Jun 2024 06:16:07 +0000 (07:16 +0100)]
i386: Handle sign_extend like zero_extend in *concatditi3_[346]
This patch generalizes some of the patterns in i386.md that recognize
double word concatenation, so they handle sign_extend the same way that
they handle zero_extend in appropriate contexts.
As a motivating example consider the following function:
__int128 foo(long long x, unsigned long long y)
{
return ((__int128)x<<64) | y;
}
when compiled with -O2, x86_64 currently generates:
with this patch we now generate (the same as if x is unsigned):
foo: movq %rsi, %rax
movq %rdi, %rdx
ret
Treating both extensions the same way using any_extend is valid as
the top (extended) bits are "unused" after the shift by 64 (or more).
In theory, the RTL optimizers might consider canonicalizing the form
of extension used in these cases, but zero_extend is faster on some
machine, whereas sign extension is supported via addressing modes on
others, so handling both in the machine description is probably best.
2024-06-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (*concat<mode><dwi>3_3): Change zero_extend
to any_extend in first operand to left shift by mode precision.
(*concat<mode><dwi>3_4): Likewise.
(*concat<mode><dwi>3_6): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/concatditi-1.c: New test case.
Roger Sayle [Fri, 28 Jun 2024 06:12:53 +0000 (07:12 +0100)]
i386: Some additional AVX512 ternlog refinements.
This patch is another round of refinements to fine tune the new ternlog
infrastructure in i386's sse.md. This patch tweaks ix86_ternlog_idx
to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to
splitting (before reload), when force_register is called on all but
one of these operands. Conceptually during the dynamic programming,
registers fill the args slots in the order 0, 1, 2, and mem-like
operands fill the slots in the order 2, 0, 1 [preferring the memory
operand to come last].
This patch allows us to remove some of the legacy ternlog patterns
in sse.md without regressions [which is left to the next and final
patch in this series]. An indication that these patterns are no
longer required is shown by the necessary testsuite tweaks below,
where the output assembler for the legacy instructions used hexadecimal,
but with the new ternlog infrastructure now consistently use decimal.
2024-06-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_ternlog_idx) <case VEC_DUPLICATE>:
Add a "goto do_mem_operand" as this need not match memory_operand.
<case CONST_VECTOR>: Only args[2] may be volatile memory operand.
Allow MEM/VEC_DUPLICATE/CONST_VECTOR as args[0] and args[1].
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-andn-di-zmm-2.c: Match decimal instead
of hexadecimal immediate operand to ternlog.
* gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-orn-si-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-orn-si-zmm-2.c: Likewise.
* gcc.target/i386/pr100711-3.c: Likewise.
* gcc.target/i386/pr100711-4.c: Likewise.
* gcc.target/i386/pr100711-5.c: Likewise.
late-combine relies on df, which for -O0 is only initialised late
(pass_df_initialize_no_opt, after split1). Other df-based passes
cope with this by requiring optimize > 0, so this patch does the
same for late-combine.
gcc/
PR rtl-optimization/115677
* late-combine.cc (pass_late_combine::gate): New function.
s390: Check for ADDR_REGS in s390_decompose_addrstyle_without_index
An explicit check for address registers was not required so far since
during register allocation the processing of address constraints was
sufficient. However, address constraints themself do not check for
REGNO_OK_FOR_{BASE,INDEX}_P. Thus, with the newly introduced
late-combine pass in r15-1579-g792f97b44ffc5e we generate new insns with
invalid address registers which aren't fixed up afterwards.
Fixed by explicitly checking for address registers in
s390_decompose_addrstyle_without_index such that those new insns are
rejected.
gcc/ChangeLog:
PR target/115634
* config/s390/s390.cc (s390_decompose_addrstyle_without_index):
Check for ADDR_REGS in s390_decompose_addrstyle_without_index.
Richard Biener [Thu, 27 Jun 2024 09:26:08 +0000 (11:26 +0200)]
tree-optimization/115669 - fix SLP reduction association
The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.
When we achieved SLP only we can move and update this meta-data.
PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.
Jonathan Wakely [Tue, 11 Jun 2024 15:45:43 +0000 (16:45 +0100)]
libstdc++: Fix std::codecvt<wchar_t, char, mbstate_t> for empty dest [PR37475]
For the GNU locale model, codecvt::do_out and codecvt::do_in incorrectly
return 'ok' when the destination range is empty. That happens because
detecting incomplete output is done in the loop body, and the loop is
never even entered if to == to_end.
By restructuring the loop condition so that we check the output range
separately, we can ensure that for a non-empty source range, we always
enter the loop at least once, and detect if the destination range is too
small.
The loops also seem easier to reason about if we return immediately on
any error, instead of checking the result twice on every iteration. We
can use an RAII type to restore the locale before returning, which also
simplifies all the other member functions.
libstdc++-v3/ChangeLog:
PR libstdc++/37475
* config/locale/gnu/codecvt_members.cc (Guard): New RAII type.
(do_out, do_in): Return partial if the destination is empty but
the source is not. Use Guard to restore locale on scope exit.
Return immediately on any conversion error.
(do_encoding, do_max_length, do_length): Use Guard.
* testsuite/22_locale/codecvt/in/char/37475.cc: New test.
* testsuite/22_locale/codecvt/in/wchar_t/37475.cc: New test.
* testsuite/22_locale/codecvt/out/char/37475.cc: New test.
* testsuite/22_locale/codecvt/out/wchar_t/37475.cc: New test.
Alexandre Oliva [Thu, 27 Jun 2024 10:22:48 +0000 (07:22 -0300)]
[libstdc++] [testsuite] defer to check_vect_support* [PR115454]
The newly-added testcase overrides the default dg-do action set by
check_vect_support_and_set_flags (in libstdc++-dg/conformance.exp), so
it attempts to run the test even if runtime vector support is not
available.
Remove the explicit dg-do directive, so that the default is honored,
and the test is run if vector support is found, and only compiled
otherwise.
for libstdc++-v3/ChangeLog
PR libstdc++/115454
* testsuite/experimental/simd/pr115454_find_last_set.cc: Defer
to check_vect_support_and_set_flags's default dg-do action.