Andrew MacLeod [Thu, 28 Sep 2023 13:19:32 +0000 (09:19 -0400)]
Remove pass counting in VRP.
Rather than using a pass count to decide which parameters are passed to
VRP, makemit explicit.
* passes.def (pass_vrp): Pass "final pass" flag as parameter.
* tree-vrp.cc (vrp_pass_num): Remove.
(pass_vrp::my_pass): Remove.
(pass_vrp::pass_vrp): Add warn_p as a parameter.
(pass_vrp::final_p): New.
(pass_vrp::set_pass_param): Set final_p param.
(pass_vrp::execute): Call execute_range_vrp with no conditions.
(make_pass_vrp): Pass additional parameter.
(make_pass_early_vrp): Ditto.
Andrew MacLeod [Wed, 27 Sep 2023 16:34:16 +0000 (12:34 -0400)]
Return TRUE only when a global value is updated.
set_range_info should return TRUE only when it sets a new value. VRP no
longer overwrites global ranges DOM has set. Check for ranges in the
final listing.
gcc/
* tree-ssanames.cc (set_range_info): Return true only if the
current value changes.
gcc/testsuite/
* gcc.dg/pr93917.c: Check for ranges in final optimized listing.
* gcc.dg/tree-ssa/vrp-unreachable.c: Ditto.
Roger Sayle [Tue, 3 Oct 2023 11:52:04 +0000 (12:52 +0100)]
ARC: Use rlc r0,0 to implement scc_ltu (i.e. carry_flag ? 1 : 0)
This patch teaches the ARC backend that the contents of the carry flag
can be placed in an integer register conveniently using the "rlc rX,0"
instruction, which is a rotate-left-through-carry using zero as a source.
This is a convenient special case for the LTU form of the scc pattern.
unsigned int foo(unsigned int x, unsigned int y)
{
return (x+y) < x;
}
[which after an addition to set the carry flag, sets r0 to 1,
followed by a conditional assignment of r0 to zero if the
carry flag is clear]. With the new define_insn/optimization
in this patch, this becomes:
foo: add.f 0,r0,r1
j_s.d [blink]
rlc r0,0
This define_insn is also a useful building block for implementing
shifts and rotates.
2023-10-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.md (CC_ltu): New mode iterator for CC and CC_C.
(scc_ltu_<mode>): New define_insn to handle LTU form of scc_insn.
(*scc_insn): Don't split to a conditional move sequence for LTU.
gcc/testsuite/ChangeLog
* gcc.target/arc/scc-ltu.c: New test case.
Andrea Corallo [Tue, 19 Sep 2023 13:12:08 +0000 (15:12 +0200)]
aarch64: Convert aarch64 multi choice patterns to new syntax
Hi all,
this patch converts a number of multi multi choice patterns within the
aarch64 backend to the new syntax.
The list of the converted patterns is in the Changelog.
For completeness here follows the list of multi choice patterns that
were rejected for conversion by my parser, they typically have some C
as asm output and require some manual intervention:
aarch64_simd_vec_set<mode>, aarch64_get_lane<mode>,
aarch64_cm<optab>di, aarch64_cm<optab>di, aarch64_cmtstdi,
*aarch64_movv8di, *aarch64_be_mov<mode>, *aarch64_be_movci,
*aarch64_be_mov<mode>, *aarch64_be_movxi, *aarch64_sve_mov<mode>_le,
*aarch64_sve_mov<mode>_be, @aarch64_pred_mov<mode>,
@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx4SI_ONLY:mode>,
@aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>,
*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_sxtw,
*aarch64_sve_gather_prefetch<SVE_FULL_I:mode><VNx2DI_ONLY:mode>_uxtw,
@aarch64_vec_duplicate_vq<mode>_le, *vec_extract<mode><Vel>_0,
*vec_extract<mode><Vel>_v128, *cmp<cmp_op><mode>_and,
*fcm<cmp_op><mode>_and_combine, @aarch64_sve_ext<mode>,
@aarch64_sve2_<su>aba<mode>, *sibcall_insn, *sibcall_value_insn,
*xor_one_cmpl<mode>3, *insv_reg<mode>_<SUBDI_BITS>,
*aarch64_bfi<GPI:mode><ALLX:mode>_<SUBDI_BITS>,
*aarch64_bfidi<ALLX:mode>_subreg_<SUBDI_BITS>, *aarch64_bfxil<mode>,
*aarch64_bfxilsi_uxtw,
*aarch64_<su_optab>cvtf<fcvt_target><GPF:mode>2_mult,
atomic_store<mode>.
Bootstraped and reg tested on aarch64-unknown-linux-gnu, also I
analysed tmp-mddump.md (from 'make mddump') and could not find
effective differences, okay for trunk?
When I first implemented COPYSIGN support in the power7 days, we did not have a
copysign RTL insn, so I had to use UNSPEC to represent the copysign
instruction. This patch removes those UNSPECs, and it uses the native RTL
copysign insn.
2023-10-02 Michael Meissner <meissner@linux.ibm.com>
gcc/
* config/rs6000/rs6000.md (UNSPEC_COPYSIGN): Delete.
(copysign<mode>3_fcpsg): Use copysign RTL instead of UNSPEC.
(copysign<mode>3_hard): Likewise.
(copysign<mode>3_soft): Likewise.
* config/rs6000/vector.md (vector_copysign<mode>3): Use copysign RTL
instead of UNSPEC.
* config/rs6000/vsx.md (vsx_copysign<mode>3): Use copysign RTL instead
of UNSPEC.
David Malcolm [Mon, 2 Oct 2023 16:16:55 +0000 (12:16 -0400)]
diagnostics: add diagnostic_output_format class
Eliminate various global variables in the json/sarif output code by
bundling together callbacks and state into a new diagnostic_output_format
class, with per-output-format subclasses.
No functional change intended.
gcc/ChangeLog:
* diagnostic-format-json.cc (toplevel_array): Remove global in
favor of json_output_format::m_top_level_array.
(cur_group): Likewise, for json_output_format::m_cur_group.
(cur_children_array): Likewise, for
json_output_format::m_cur_children_array.
(class json_output_format): New.
(json_begin_diagnostic): Remove, in favor of
json_output_format::on_begin_diagnostic.
(json_end_diagnostic): Convert to...
(json_output_format::on_end_diagnostic): ...this.
(json_begin_group): Remove, in favor of
json_output_format::on_begin_group.
(json_end_group): Remove, in favor of
json_output_format::on_end_group.
(json_flush_to_file): Remove, in favor of
json_output_format::flush_to_file.
(json_stderr_final_cb): Remove, in favor of json_output_format
dtor.
(json_output_base_file_name): Remove global.
(class json_stderr_output_format): New.
(json_file_final_cb): Remove.
(class json_file_output_format): New.
(json_emit_diagram): Remove.
(diagnostic_output_format_init_json): Update.
(diagnostic_output_format_init_json_file): Update.
* diagnostic-format-sarif.cc (the_builder): Remove this global,
moving to a field of the sarif_output_format.
(sarif_builder::maybe_make_artifact_content_object): Use the
context's m_file_cache.
(get_source_lines): Convert to...
(sarif_builder::get_source_lines): ...this, using context's
m_file_cache.
(sarif_begin_diagnostic): Remove, in favor of
sarif_output_format::on_begin_diagnostic.
(sarif_end_diagnostic): Remove, in favor of
sarif_output_format::on_end_diagnostic.
(sarif_begin_group): Remove, in favor of
sarif_output_format::on_begin_group.
(sarif_end_group): Remove, in favor of
sarif_output_format::on_end_group.
(sarif_flush_to_file): Delete.
(sarif_stderr_final_cb): Delete.
(sarif_output_base_file_name): Delete.
(sarif_file_final_cb): Delete.
(class sarif_output_format): New.
(sarif_emit_diagram): Delete.
(class sarif_stream_output_format): New.
(class sarif_file_output_format): New.
(diagnostic_output_format_init_sarif): Update.
(diagnostic_output_format_init_sarif_stderr): Update.
(diagnostic_output_format_init_sarif_file): Update.
(diagnostic_output_format_init_sarif_stream): Update.
* diagnostic-show-locus.cc (diagnostic_show_locus): Update.
* diagnostic.cc (default_diagnostic_final_cb): Delete, moving to
diagnostic_text_output_format's dtor.
(diagnostic_initialize): Update, making a new instance of
diagnostic_text_output_format.
(diagnostic_finish): Delete m_output_format, rather than calling
final_cb.
(diagnostic_report_diagnostic): Assert that m_output_format is
non-NULL. Replace call to begin_group_cb with call to
m_output_format->on_begin_group. Replace call to
diagnostic_starter with call to
m_output_format->on_begin_diagnostic. Replace call to
diagnostic_finalizer with call to
m_output_format->on_end_diagnostic.
(diagnostic_emit_diagram): Replace both optional call to
m_diagrams.m_emission_cb and default implementation with call to
m_output_format->on_diagram. Move default implementation to
diagnostic_text_output_format::on_diagram.
(auto_diagnostic_group::~auto_diagnostic_group): Replace call to
end_group_cb with call to m_output_format->on_end_group.
(diagnostic_text_output_format::~diagnostic_text_output_format):
New, based on default_diagnostic_final_cb.
(diagnostic_text_output_format::on_begin_diagnostic): New, based
on code from diagnostic_report_diagnostic.
(diagnostic_text_output_format::on_end_diagnostic): Likewise.
(diagnostic_text_output_format::on_diagram): New, based on code
from diagnostic_emit_diagram.
* diagnostic.h (class diagnostic_output_format): New.
(class diagnostic_text_output_format): New.
(diagnostic_context::begin_diagnostic): Move to...
(diagnostic_context::m_text_callbacks::begin_diagnostic): ...here.
(diagnostic_context::start_span): Move to...
(diagnostic_context::m_text_callbacks::start_span): ...here.
(diagnostic_context::end_diagnostic): Move to...
(diagnostic_context::m_text_callbacks::end_diagnostic): ...here.
(diagnostic_context::begin_group_cb): Remove, in favor of
m_output_format->on_begin_group.
(diagnostic_context::end_group_cb): Remove, in favor of
m_output_format->on_end_group.
(diagnostic_context::final_cb): Remove, in favor of
m_output_format's dtor.
(diagnostic_context::m_output_format): New field.
(diagnostic_context::m_diagrams.m_emission_cb): Remove, in favor
of m_output_format->on_diagram.
(diagnostic_starter): Update.
(diagnostic_finalizer): Update.
(diagnostic_output_format_init_sarif_stream): New.
* input.cc (location_get_source_line): Move implementation apart from
call to diagnostic_file_cache_init to...
(file_cache::get_source_line): ...this new function...
(location_get_source_line): ...and reintroduce, rewritten in terms of
file_cache::get_source_line.
(get_source_file_content): Likewise, refactor into...
(file_cache::get_source_file_content): ...this new function.
* input.h (file_cache::get_source_line): New decl.
(file_cache::get_source_file_content): New decl.
* selftest-diagnostic.cc
(test_diagnostic_context::test_diagnostic_context): Update.
* tree-diagnostic-path.cc (event_range::print): Update for
change to diagnostic_context's start_span callback.
gcc/fortran/ChangeLog:
* error.cc (gfc_diagnostics_init): Update for change to start_span.
gcc/jit/ChangeLog:
* dummy-frontend.cc (jit_langhook_init): Update for change to
diagnostic_context callbacks.
David Malcolm [Mon, 2 Oct 2023 16:16:55 +0000 (12:16 -0400)]
diagnostics: group together source printing fields of diagnostic_context
struct diagnostic_context has > 60 fields.
Try to tame some of the complexity by grouping together the 8
source-printing fields into a struct, the "m_source_printing" field.
No functional change intended.
gcc/ada/ChangeLog:
* gcc-interface/misc.cc (gnat_post_options): Update for renaming
of diagnostic_context's show_caret to m_source_printing.enabled.
gcc/analyzer/ChangeLog:
* program-point.cc: Update for grouping of source printing fields
within diagnostic_context.
gcc/c-family/ChangeLog:
* c-common.cc (maybe_add_include_fixit): Update for renaming of
diagnostic_context's show_caret to m_source_printing.enabled.
* c-opts.cc (c_common_init_options): Update for renaming of
diagnostic_context's colorize_source_p to
m_source_printing.colorize_source_p.
The v7 memory ordering model allows reordering of conditional atomic
instructions. To avoid this, make all atomic patterns unconditional.
Expand atomic loads and stores for all architectures so the memory access
can be wrapped into an UNSPEC.
combine started introducing useless moves on hard registers, when one of the
arguments to our scalar xorsign is a hardreg we get an additional move inserted.
This leads to combine forming an AND with the immediate inside and using the
superflous move to do the r->w move, instead of what we wanted before which was
for the `and` to be a vector and and have reload pick the right alternative.
To fix this the patch just forces the use of the vector version directly and
so combine has no chance to mess it up.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (xorsign<mode>3): Renamed to..
(@xorsign<mode>3): ...This.
* config/aarch64/aarch64.md (xorsign<mode>3): Renamed to...
(@xorsign<mode>3): ..This and emit vectors directly
* config/aarch64/iterators.md (VCONQ): Add SF and DF.
Tamar Christina [Mon, 2 Oct 2023 10:50:24 +0000 (11:50 +0100)]
rtl: relax validate_subreg to allow paradoxical subregs that change mode
This patch relaxes the subreg invariant that you can only change modes
or make it paradoxical in one conversion. i.e. it now allows subreg:V2DI (reg:DF ..))
This is well defined in the generic sense and allowing it would enable
you to write RTL without the extra moves which can be interfered with by
combine.
Patch has been pre-approved[1], but giving people chance to object
There are two problems here; first that the emitted asm for
-fdebug-types-section is ELF-specfic leading to assembler errors for
Mach-O. If we fix this, we get a secondary fail since the debug linker
does not recognise DW_FORM_ref_sig8. Disable ths test until we get
DWARF-5 support in the external Darwin toolchain components.
rtl-tests.cc and simplify-rtx.cc used partial specialisation
to try to restrict the NUM_POLY_INT_COEFFS>1 tests without
resorting to preprocessor tests. That now triggers an error
in some configurations, since the NUM_POLY_INT_COEFFS>1 tests
used the global poly_int64, whose definition does not depend
on the template parameter.
This patch uses local types that do depend on the template parameter.
gcc/
PR bootstrap/111642
* rtl-tests.cc (const_poly_int_tests<N>::run): Use a local
poly_int64 typedef.
* simplify-rtx.cc (simplify_const_poly_int_tests<N>::run): Likewise.
rtl-optimization/110939 Really fix narrow comparison of memory and constant
In the former fix in commit 41ef5a34161356817807be3a2e51fbdbe575ae85 I
completely missed the fact that the normal form of a CONST_INT for a
mode with fewer bits than in HOST_WIDE_INT is a sign extended version of
the actual constant. This even holds true for unsigned constants.
Fixed by masking out the upper bits for the incoming constant and sign
extending the resulting unsigned constant.
gcc/ChangeLog:
* combine.cc (simplify_compare_const): Properly handle unsigned
constants while narrowing comparison of memory and constants.
Feng Wang [Tue, 12 Sep 2023 09:18:05 +0000 (09:18 +0000)]
RISC-V:Optimize the MASK opt generation
The corresponding MASK and TARGET will be automatically generated.
Accoring to Kito's advice, using "MASK(name) Var(other_flag_name)"
to generate MASK and TARGET MACRO automatically.
This patch improve the MACRO generation of MASK_* and TARGET_*.
Due to the more and more riscv extensions are added, the default target_flag
is full.
Before this patch,if you want to add new MACRO,you should define the
MACRO in the riscv-opts.h manually.
After this patch, you just need two steps:
1.Define the new TargetVariable.
2.Define "MASK(name) Var(new_target_flag).
This patch fixes 2 issues. One is when we want to get address of
an uninitialized large/huge bitint SSA_NAME for multiplication/division/modulo
or conversion to floating point (binary or decimal), the code just creates
an uninitialized limb sized variable and passes address of that, but I forgot
to initialize *prec in that case, so it invoked UB at compile time rather
than at runtime. As it is UB, we could use anything valid as precision there,
say 2 bits for signed, 1 bit for unsigned as smallest possible set of values,
or full bitint precision as full random value. Though, because we only pass
address to a single limb, I think it is best to pass the bitsize of the limb.
And the other issue is that when ranger in range_to_prec finds some range
is undefined_p (), it will assert {lower,upper}_bound () method isn't called
on it, but we were. So, the patch adjusts range_to_proc to treat it like
the !optimized case, full bitint precision.
2023-09-30 Jakub Jelinek <jakub@redhat.com>
PR middle-end/111625
PR middle-end/111637
* gimple-lower-bitint.cc (range_to_prec): Use prec or -prec if
r.undefined_p ().
(bitint_large_huge::handle_operand_addr): For uninitialized operands
use limb_prec or -limb_prec precision.
Jakub Jelinek [Sat, 30 Sep 2023 09:26:14 +0000 (11:26 +0200)]
vec.h: Uncomment static_assert
Now that poly_int_pod has been removed and other changes mostly to use
{quick,safe}_grow_cleared instead of {quick,safe}_grow for types with
non-trivial default construction, we can enable even the last static
assertion. From now on, {quick,safe}_grow can only be used with
trivially default constructible types.
So this ends up looking a lot like the bits that I had to revert several weeks
ago :-)
The core issue we have is given an INSN the generic code will cost the SET_SRC
and SET_DEST and sum them. But that's far from ideal on a RISC target.
For a register destination, the cost can be determined be looking at just the
SET_SRC. Which is precisely what this patch does. When the outer code is an
INSN and we're presented with a SET we take one of two paths.
If the destination is a register, then we recurse just on the SET_SRC and we're
done. Otherwise we fall back to the existing code which sums the cost of the
SET_SRC and SET_DEST. That fallback path isn't great and probably could be
further improved (just costing SET_DEST in that case is probably quite
reasonable).
The difference between this version and the bits that slipped through by
accident several weeks ago is that old version mis-used the API due to a thinko
on my part.
This tightens up various zicond tests to avoid undesirable matching.
This has been tested on rv64gc -- the only difference it makes on the testsuite
is the new tests (included in this patch) flip from failing to passing.
Pushed to the trunk.
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Better handle costing
SETs when the outer code is INSN.
Patrick O'Neill [Fri, 29 Sep 2023 21:20:07 +0000 (14:20 -0700)]
RISC-V: Specify -mabi=lp64d in wredsum_vlmax.c testcase
Resolves this error on rv32gcv:
cc1: error: ABI requires '-march=rv32'
compiler exited with status 1
FAIL: gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c -O0 (test for excess errors)
Tested for regressions using glibc rv32gcv/rv64gcv multilib on r14-4339-geaa41a6dc12.
RISC-V: Replace not + bitwise_imm with li + bitwise_not
In the case when we have C code like this
int foo (int a) {
return 100 & ~a;
}
GCC generates the following instruction sequence
foo:
not a0,a0
andi a0,a0,100
ret
This patch replaces that with this sequence
foo:
li a5,100
andn a0,a5,a0
ret
The profitability comes from an out-of-order processor being able to
issue the "li a5, 100" at any time after it's fetched while "not a0, a0" has
to wait until any prior setter of a0 has reached completion.
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<optab>_not_const<mode>): New split
pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbb-andn-orn-01.c: New test.
* gcc.target/riscv/zbb-andn-orn-02.c: Likewise.
poly_int was written before the switch to C++11 and so couldn't
use explicit default constructors. This led to an awkward split
between poly_int_pod and poly_int. poly_int simply inherited from
poly_int_pod and added constructors, with the argumentless constructor
having an empty body. But inheritance meant that poly_int had to
repeat the assignment operators from poly_int_pod (again, no C++11,
so no "using" to inherit base-class implementations).
All that goes away if we switch to using default constructors.
The main complication is ensuring that braced initialisation still
gives a constexpr, so that static variables can be initialised without
runtime code. The two problems here are:
(1) When initialising a poly_int<N, wide_int> with fewer than N
coefficients, the other coefficients need to be a zero of
the same precision as the explicit coefficients. This was
previously done in a for loop using wi::ints_for<...>::zero,
but C++11 constexpr constructors can't have function bodies.
The patch instead uses a series of delegated initialisers to
fill in the implicit coefficients.
(2) The initialisation in:
void f(int x) {
unsigned int foo {x};
}
produces the warning:
warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing]
whereas:
void f(int x) {
unsigned int foo = x;
}
does not. So switching to direct initialisation of the coeffs array
would mean that:
poly_uin64_t x = 0;
would trigger a warning for using 0 rather than 0u. That seemed
overly pedantic, so the patch adds explicit casts to the constructor.
The complication is to do that without adding extra code to
wide-int versions. The patch uses a new init_cast type for that.
gcc/
* poly-int.h (poly_int_pod): Delete.
(poly_coeff_traits::init_cast): New type.
(poly_int_full, poly_int_hungry, poly_int_fullness): New structures.
(poly_int): Replace constructors that take 1 and 2 coefficients with
a general one that takes an arbitrary number of coefficients.
Delegate initialization to two new private constructors, one of
which uses the coefficients as-is and one of which adds an extra
zero of the appropriate type (and precision, where applicable).
(gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods.
* poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod)
(poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete.
* gengtype.cc (main): Don't register poly_int64_pod.
* calls.cc (initialize_argument_information): Use poly_int rather
than poly_int_pod.
(combine_pending_stack_adjustment_and_call): Likewise.
* config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise.
* data-streamer.h (bp_unpack_poly_value): Likewise.
* dwarf2cfi.cc (struct dw_trace_info): Likewise.
(struct queued_reg_save): Likewise.
* dwarf2out.h (struct dw_cfa_location): Likewise.
* emit-rtl.h (struct incoming_args): Likewise.
(struct rtl_data): Likewise.
* expr.cc (get_bit_range): Likewise.
(get_inner_reference): Likewise.
* expr.h (get_bit_range): Likewise.
* fold-const.cc (split_address_to_core_and_offset): Likewise.
(ptr_difference_const): Likewise.
* fold-const.h (ptr_difference_const): Likewise.
* function.cc (try_fit_stack_local): Likewise.
(instantiate_new_reg): Likewise.
* function.h (struct expr_status): Likewise.
(struct args_size): Likewise.
* genmodes.cc (ZERO_COEFFS): Likewise.
(mode_size_inline): Likewise.
(mode_nunits_inline): Likewise.
(emit_mode_precision): Likewise.
(emit_mode_size): Likewise.
(emit_mode_nunits): Likewise.
* gimple-fold.cc (get_base_constructor): Likewise.
* gimple-ssa-store-merging.cc (struct symbolic_number): Likewise.
* inchash.h (class hash): Likewise.
* ipa-modref-tree.cc (modref_access_node::dump): Likewise.
* ipa-modref.cc (modref_access_analysis::merge_call_side_effects):
Likewise.
* ira-int.h (ira_spilled_reg_stack_slot): Likewise.
* lra-eliminations.cc (self_elim_offsets): Likewise.
* machmode.h (mode_size, mode_precision, mode_nunits): Likewise.
* omp-low.cc (omplow_simd_context): Likewise.
* pretty-print.cc (pp_wide_integer): Likewise.
* pretty-print.h (pp_wide_integer): Likewise.
* reload.cc (struct decomposition): Likewise.
* reload.h (struct reload): Likewise.
* reload1.cc (spill_stack_slot_width): Likewise.
(struct elim_table): Likewise.
(offsets_at): Likewise.
(init_eliminable_invariants): Likewise.
* rtl.h (union rtunion): Likewise.
(poly_int_rtx_p): Likewise.
(strip_offset): Likewise.
(strip_offset_and_add): Likewise.
* rtlanal.cc (strip_offset): Likewise.
* tree-dfa.cc (get_ref_base_and_extent): Likewise.
(get_addr_base_and_unit_offset_1): Likewise.
(get_addr_base_and_unit_offset): Likewise.
* tree-dfa.h (get_ref_base_and_extent): Likewise.
(get_addr_base_and_unit_offset_1): Likewise.
(get_addr_base_and_unit_offset): Likewise.
* tree-ssa-loop-ivopts.cc (struct iv_use): Likewise.
(strip_offset): Likewise.
* tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise.
* tree.cc (ptrdiff_tree_p): Likewise.
* tree.h (poly_int_tree_p): Likewise.
(ptrdiff_tree_p): Likewise.
(get_inner_reference): Likewise.
gcc/testsuite/
* gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use
poly_int rather than poly_int_pod.
Gaius Mulley [Fri, 29 Sep 2023 16:18:16 +0000 (17:18 +0100)]
modula2: iso library SysClock.mod and wrapclock.cc fixes.
This patch corrects the C equivalent of m2 LONGINT parameters
in wrapclock.cc and corrects the SysClock.mod module.
wrapclock.cc uses a typedef long long int longint_t to match
m2 LONGINT (rather than unsigned long). These fixes
prevent calls to SysClock hanging spinning on an (incorrect)
large day count from the epoch.
gcc/m2/ChangeLog:
* gm2-compiler/M2Quads.mod (EndBuildFor): Improve
block comments.
* gm2-libs-iso/SysClock.mod (ExtractDate): Replace
testDays with yearOfDays. New local variable monthOfDays.
libgm2/ChangeLog:
* libm2iso/wrapclock.cc (longint_t): New declaration.
(GetTimespec): Replace types for sec and nano with
longint_t.
(SetTimespec): Replace types for sec and nano with
longint_t.
Fix memory barrier patterns for pre PA8800 processors
2023-09-29 John David Anglin <danglin@gcc.gnu.org>
* config/pa/pa.md (memory_barrier): Revise comment.
(memory_barrier_64, memory_barrier_32): Use ldcw,co on PA 2.0.
* config/pa/pa.opt (coherent-ldcw): Change default to disabled.
libstdc++: Fix handling of surrogate CP in codecvt [PR108976]
This patch fixes the handling of surrogate code points in all standard
facets for transcoding Unicode that are based on std::codecvt. Surrogate
code points should always be treated as error. On the other hand
surrogate code units can only appear in UTF-16 and only when they come
in a proper pair.
Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number
of bytes were given in the range [from, from_end), error was returned
always. The last byte in such range does not form a full UTF-16 code
unit and we can not make any decisions for error, instead partial should
be returned.
The testsuite for testing these facets was updated in the following
order:
1. All functions that test codecvts that work with UTF-8 were refactored
and made more generic so they accept codecvt that works with the char
type char8_t.
2. The same functions were updated with new test cases for transcoding
errors and now additionally test for surrogates, overlong UTF-8
sequences, code points out of the Unicode range, and more tests for
missing leading and trailing code units.
3. New tests were added to test codecvt_utf16 in both of its variants,
UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2.
libstdc++-v3/ChangeLog:
PR libstdc++/108976
* src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of
surrogates in UTF-8.
(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8.
(ucs4_in): Fix handling of range with odd number of bytes.
(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16.
(ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16.
(ucs2_in): Fix handling of range with odd number of bytes.
(__codecvt_utf16_base<char16_t>::do_in): Likewise.
(__codecvt_utf16_base<char32_t>::do_in): Likewise.
(__codecvt_utf16_base<wchar_t>::do_in): Likewise.
* testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add
tests for codecvt_utf16<char16_t> and codecvt_utf16<char32_t>.
* testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8
testing functions for char8_t, add more test cases for errors,
add testing functions for codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc:
Renames, add tests for codecvt_utf16<whchar_t>.
* testsuite/22_locale/codecvt/codecvt_utf16/79980.cc (test06):
Fix test.
* testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New
test.
Paul Iannetta [Fri, 29 Sep 2023 14:50:28 +0000 (08:50 -0600)]
Harmonize headers between both dg-extract-results scripts
The header of the python version looked like:
Target is ...
Host is ...
The header of the bash version looked like:
Test run by ... on ...
Target is ...
After this change both headers look like:
Test run by ... on ...
Target is ...
Host is ...
The order of the tests is not the same but since dg-cmp-results.sh it
does not matter much.
contrib/ChangeLog:
* dg-extract-results.py: Print the "Test run" line.
* dg-extract-results.sh: Print the "Host" line.
Jakub Jelinek [Fri, 29 Sep 2023 13:14:52 +0000 (15:14 +0200)]
vec.h: Guard most of static assertions for GCC >= 5
As reported by Jonathan on IRC, my vec.h patch broke build with GCC 4.8.x
or 4.9.x as system compiler, e.g. on CFarm.
The problem is that while all of
std::is_trivially_{destructible,copyable,default_constructible} traits
are in C++, only std::is_trivially_destructible has been implemented in GCC
4.8, the latter two were added only in GCC 5.
Only std::is_trivially_destructible is the really important one though,
which is used to decide what pop returns and whether to invoke the
destructors or not. The rest are solely used in static_asserts and as such
I think it is acceptable if we don't assert those when built with GCC 4.8
or 4.9, anybody doing bootstrap from those system compilers or doing builds
with newer GCC will catch that.
So, the following patch guards those for 5+.
If we switch to C++14 later on and start requiring newer version of system
GCC as well (do we require GCC >= 5 which claims the last C++14 language
features, or what provides all C++14 library features, or GCC >= 6 which
uses -std=c++14 by default?), this patch then can be reverted.
2023-09-29 Jakub Jelinek <jakub@redhat.com>
* vec.h (quick_insert, ordered_remove, unordered_remove,
block_remove, qsort, sort, stablesort, quick_grow): Guard
std::is_trivially_{copyable,default_constructible} and
vec_detail::is_trivially_copyable_or_pair static assertions
with GCC_VERSION >= 5000.
(vec_detail::is_trivially_copyable_or_pair): Guard definition
with GCC_VERSION >= 5000.
This improves the code structure of the ldp-stp policies
patch introduced in 834fc2bf
Bootstrapped and regtested on aarch64-linux.
gcc/ChangeLog:
* config/aarch64/aarch64-opts.h (enum aarch64_ldp_policy): Removed.
(enum aarch64_ldp_stp_policy): Merged enums aarch64_ldp_policy
and aarch64_stp_policy to aarch64_ldp_stp_policy.
(enum aarch64_stp_policy): Removed.
* config/aarch64/aarch64-protos.h (struct tune_params): Removed
aarch64_ldp_policy_model and aarch64_stp_policy_model enum types
and left only the definitions to the aarch64-opts one.
* config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): Removed.
(aarch64_parse_stp_policy): Removed.
(aarch64_override_options_internal): Removed calls to parsing
functions and added obvious direct assignments.
(aarch64_mem_ok_with_ldpstp_policy_model): Improved
code quality based on the new changes.
* config/aarch64/aarch64.opt: Use single enum type
aarch64_ldp_stp_policy for both ldp and stp options.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ldp_aligned.c: Splitted into this and
ldp_unaligned.
* gcc.target/aarch64/stp_aligned.c: Splitted into this and
stp_unaligned.
* gcc.target/aarch64/ldp_unaligned.c: New test.
* gcc.target/aarch64/stp_unaligned.c: New test.
Signed-off-by: Manos Anagnostakis <manos.anagnostakis@vrull.eu> Suggested-by: Richard Sandiford <richard.sandiford@arm.com>
Richard Biener [Fri, 29 Sep 2023 09:08:18 +0000 (11:08 +0200)]
tree-optimization/111583 - loop distribution issue
The following conservatively fixes loop distribution to only
recognize memset/memcpy and friends when at least one element
is going to be processed. This avoids having an unconditional
builtin call in the IL that might imply the source and destination
pointers are non-NULL when originally pointers were not always
dereferenced.
With -Os loop header copying is less likely to ensure this.
PR tree-optimization/111583
* tree-loop-distribution.cc (find_single_drs): Ensure the
load/store are always executed.
* gcc.dg/tree-ssa/pr111583-1.c: New testcase.
* gcc.dg/tree-ssa/pr111583-2.c: Likewise.
Jakub Jelinek [Fri, 29 Sep 2023 09:23:16 +0000 (11:23 +0200)]
use *_grow_cleared rather than *_grow on vect_unpromoted_value
vect_recog_over_widening_pattern is another spot which triggers the
right now commented out static assertion in vec.h which asserts
{quick,safe}_grow vec operations are only used with trivially default
constructible types.
I had a look at this and I think using quick_grow_cleared is best choice
here. The nops is 2 or 1 most of the time, worst case 3, so the price of
extra initialization of 4 pointer-sized-or-less members times 1, 2 or 3
doesn't seem worth bothering, it is similar to the bitmap_head case where
we already pay the price for just one structure anytime we do
vect_unpromoted_value unprom_diff;
(and later set_op on it) or even
vect_unpromoted_value unprom0[2];
With this patch and Richard S's poly_int_pod removal the static_assert can
be enabled as well and gcc builds.
2023-09-29 Jakub Jelinek <jakub@redhat.com>
* tree-vect-patterns.cc (vect_recog_over_widening_pattern): Use
quick_grow_cleared method on unprom rather than quick_grow.
`deletable` and `scalar` tables are both simple: each element always
contains a pointer to the beginning of the object and it's size is the
full object.
`rtab` is different: it's `base` is a pointer in the middle of the
struct and `stride` points to the next GC pointer in the array.
Before the change there were 2 problems:
1. We memset()ed not just pointers but data around them.
2. We wen out of bounds of the last object described by gt_ggc_rtab
and triggered bootstrap failures in profile and asan bootstraps.
After the change we handle only pointers themselves like the rest of
ggc-common.cc code.
gcc/
PR middle-end/111505
* ggc-common.cc (ggc_zero_out_root_pointers, ggc_common_finalize):
Add new helper. Use helper instead of memset() to wipe out pointers.
c_readstr only operated on integer modes. It worked by reading
the source string into an array of HOST_WIDE_INTs, converting
that array into a wide_int, and from there to an rtx.
It's simpler to do this by building a target memory image and
using native_decode_rtx to convert that memory image into an rtx.
It avoids all the endianness shenanigans because both the string and
native_decode_rtx follow target memory order. It also means that the
function can handle all fixed-size modes, which simplifies callers
and allows vector modes to be used more widely.
gcc/
* builtins.h (c_readstr): Take a fixed_size_mode rather than a
scalar_int_mode.
* builtins.cc (c_readstr): Likewise. Build a local array of
bytes and use native_decode_rtx to get the rtx image.
(builtin_memcpy_read_str): Simplify accordingly.
(builtin_strncpy_read_str): Likewise.
(builtin_memset_read_str): Likewise.
(builtin_memset_gen_str): Likewise.
* expr.cc (string_cst_read_str): Likewise.
Jakub Jelinek [Fri, 29 Sep 2023 07:35:01 +0000 (09:35 +0200)]
use *_grow_cleared rather than *_grow on vec<bitmap_head>
The assert checking which is commented out in vec.h grow method requires
trivially default constructible types to be used with this method, but
bitmap_head has since the PR88317 r9-4642 workaround non-trivial default
constructor to catch bugs and we pay the minimum price of initializing
everything in bitmap_head twice on the common
bitmap_head var;
bitmap_initilize (&var, obstack);
sequence. This patch makes us pay the same price times number of elements
on
vec<bitmap_head> v;
v.create (n);
v.safe_grow_cleared (n); // previous v.safe_grow (n);
for (int i = 0; i < n; ++i)
bitmap_initialize (&v[i], obstack);
2023-09-29 Jakub Jelinek <jakub@redhat.com>
* tree-ssa-loop-im.cc (tree_ssa_lim_initialize): Use quick_grow_cleared
instead of quick_grow on vec<bitmap_head> members.
* cfganal.cc (control_dependences::control_dependences): Likewise.
* rtl-ssa/blocks.cc (function_info::build_info::build_info): Likewise.
(function_info::place_phis): Use safe_grow_cleared instead of safe_grow
on auto_vec<bitmap_head> vars.
* tree-ssa-live.cc (compute_live_vars): Use quick_grow_cleared instead
of quick_grow on vec<bitmap_head> var.
Tom Tromey [Tue, 26 Sep 2023 20:04:26 +0000 (14:04 -0600)]
libstdc++: Remove std_ratio_t_tuple
This removes the std_ratio_t_tuple function from the Python
pretty-printer code. It is not used. Apparently the relevant parts
were moved to StdChronoDurationPrinter._ratio at some point in the
past.
Tom Tromey [Tue, 26 Sep 2023 19:38:42 +0000 (13:38 -0600)]
libstdc++: Use gdb.ValuePrinter base class
GDB 14 will add a new ValuePrinter tag class that will be used to
signal that pretty-printers will agree to the "extension protocol" --
essentially that they will follow some simple namespace rules, so that
GDB can add new methods over time.
A couple new methods have already been added to GDB, to support DAP.
While I haven't implemented these for any libstdc++ printers yet, this
patch makes the basic conversion: printers derive from
gdb.ValuePrinter if it is available, and all "non-standard" (that is,
not specified by GDB) members of the various value-printing classes
are renamed to have a leading underscore.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py: Use gdb.ValuePrinter
everywhere. Rename members to start with "_".
Tom Tromey [Wed, 27 Sep 2023 19:49:59 +0000 (13:49 -0600)]
libstdc++: Show full Python stack on error
This changes the libstdc++ test suite to arrange for gdb to show the
full Python stack if any sort of Python exception occurs. This makes
debugging the printers a little simpler.
libstdc++-v3/ChangeLog:
* testsuite/lib/gdb-test.exp (gdb-test): Enable Python
stack traces from gdb.
Jonathan Wakely [Thu, 28 Sep 2023 19:52:01 +0000 (20:52 +0100)]
libstdc++: Refactor Python Xmethods to use is_specialization_of
This copies the is_specialization_of function from printers.py (with
slight modification for versioned namespace handling) and reuses it in
xmethods.py to replace repetitive re.match calls in every class.
This fixes the problem that the regular expressions used \d without
escaping the backslash properly.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/xmethods.py (is_specialization_of): Define
new function.
(ArrayMethodsMatcher, DequeMethodsMatcher)
(ForwardListMethodsMatcher, ListMethodsMatcher)
(VectorMethodsMatcher, AssociativeContainerMethodsMatcher)
(UniquePtrGetWorker, UniquePtrMethodsMatcher)
(SharedPtrSubscriptWorker, SharedPtrMethodsMatcher): Use
is_specialization_of instead of re.match.
Jonathan Wakely [Thu, 28 Sep 2023 13:54:59 +0000 (14:54 +0100)]
libstdc++: Reformat Python code
Some of these changes were suggested by autopep8's --aggressive
option, others are for readability.
Break long lines by splitting strings across multiple lines, or
introducing local variables to hold results.
Use raw strings for regular expressions, so that backslashes don't need
to be escaped.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py: Break long lines. Use raw
strings for regular expressions. Add whitespace around
operators.
(is_member_of_namespace): Use isinstance to check type.
(is_specialization_of): Likewise. Adjust template_name
for versioned namespace instead of duplicating the re.match
call.
(StdExpAnyPrinter._string_types): New static method.
(StdExpAnyPrinter.to_string): Use _string_types.
Gaius Mulley [Thu, 28 Sep 2023 18:07:04 +0000 (19:07 +0100)]
modula2: Increase linking test timeouts for slower targets
This patch introduces missing timeout handling for
pimlib-base-run-pass.exp and increases the timeout value
for larger projects which link (necessary for slower targets).
gcc/testsuite/ChangeLog:
* gm2/coroutines/pim/run/pass/coroutines-pim-run-pass.exp:
Add load_lib timeout-dg.exp and increase timeout to 60
seconds.
* gm2/pimlib/base/run/pass/pimlib-base-run-pass.exp: Add
load_lib timeout-dg.exp and increase timeout to 60 seconds.
* gm2/projects/iso/run/pass/halma/projects-iso-run-pass-halma.exp:
Increase timeout to 45 seconds.
* gm2/switches/whole-program/pass/run/switches-whole-program-pass-run.exp:
Add load_lib timeout-dg.exp and increase timeout to 120 seconds.
Remove unnecessary compile of mystrlib.mod.
* gm2/iso/run/pass/iso-run-pass.exp: Add load_lib
timeout-dg.exp and set timeout to 60 seconds.
Tim Song [Wed, 6 Sep 2023 17:31:55 +0000 (19:31 +0200)]
libstdc++: Force _Hash_node_value_base methods inline to fix abi (PR111050)
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1b6f0476837205932613ddb2b3429a55c26c409d
changed _Hash_node_value_base to no longer derive from _Hash_node_base, which means
that its member functions expect _M_storage to be at a different offset. So explosions
result if an out-of-line definition is emitted for any of the member functions (say,
in a non-optimized build) and the resulting object file is then linked with code built
using older version of GCC/libstdc++.
Although the patch improves x86-64 specfp2007, it also results in
performance and code size regression on different targets and
new GCC testsuite failures on tests expecting a specific output.
A MOPS memmove may corrupt registers since there is no copy of the input
operands to temporary registers. Fix this by calling
aarch64_expand_cpymem_mops.
Reviewed-by: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog/
PR target/111121
* config/aarch64/aarch64.md (aarch64_movmemdi): Add new expander.
(movmemdi): Call aarch64_expand_cpymem_mops for correct expansion.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem_mops): Add support
for memmove.
* config/aarch64/aarch64-protos.h (aarch64_expand_cpymem_mops): Add new
function.
Pan Li [Thu, 28 Sep 2023 05:51:07 +0000 (13:51 +0800)]
RISC-V: Support {U}INT64 to FP16 auto-vectorization
Update in v2:
* Add math trap check.
* Adjust some test cases.
Original logs:
This patch would like to support the auto-vectorization from
the INT64 to FP16. We take below steps for the conversion.
* INT64 to FP32.
* FP32 to FP16.
Given sample code as below:
void
test_func (int64_t * __restrict a, _Float16 *b, unsigned n)
{
for (unsigned i = 0; i < n; i++)
b[i] = (_Float16) (a[i]);
}
After this patch:
vsetvli a5,a2,e8,mf8,ta,ma
vle64.v v1,0(a0)
vsetvli a4,zero,e32,mf2,ta,ma
vfncvt.f.x.w v1,v1
vsetvli zero,zero,e16,mf4,ta,ma
vfncvt.f.f.w v1,v1
vsetvli zero,a2,e16,mf4,ta,ma
vse16.v v1,0(a1)
Please note VLS mode is also involved in this patch and covered by the
test cases.
PR target/111506
gcc/ChangeLog:
* config/riscv/autovec.md (<float_cvt><mode><vnnconvert>2):
New pattern.
* config/riscv/vector-iterators.md: New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test.
RISCV target developers need a flag to prevent creating
insns in IRA which can not be split after RA as they will need a
temporary reg. The patch introduces such flag.
gcc/ChangeLog:
* rtl.h (lra_in_progress): Change type to bool.
(ira_in_progress): Add new extern.
* ira.cc (ira_in_progress): New global.
(pass_ira::execute): Set up ira_in_progress.
* lra.cc: (lra_in_progress): Change type to bool and initialize.
(lra): Use bool values for lra_in_progress.
* lra-eliminations.cc (init_elim_table): Ditto.
Richard Biener [Thu, 28 Sep 2023 09:51:30 +0000 (11:51 +0200)]
target/111600 - avoid deep recursion in access diagnostics
pass_waccess::check_dangling_stores uses recursion to traverse the CFG.
The following changes this to use a heap allocated worklist to avoid
blowing the stack.
Instead of using a better iteration order it tries hard to preserve
the current iteration order to avoid new false positives to pop up
since the set of stores we keep track isn't properly modeling flow,
so what is diagnosed and what not is quite random. We are also
lacking the ideal RPO compute on the inverted graph that would just
ignore reverse unreachable code (as the current iteration scheme does).
PR target/111600
* gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores):
Use a heap allocated worklist for CFG traversal instead of
recursion.
libgfortran: Use __builtin_unreachable() not -Wno-stringop-overflow to silence warning
The only caller of write_z is formatted_transfer_scalar_write that passes
kind to 'len'; in turn, write_z is the only caller of xtoa_big, passing on
its 'len'. The kind is passed as is, except for GFC_REAL_17 for which
len = 16 is used.
libgfortran/
* io/write.c (xtoa_big): Change a 'GCC diagnostic ignored
"-Wstringop-overflow"' to an assumption (via __builtin_unreachable).t
Jakub Jelinek [Thu, 28 Sep 2023 09:59:10 +0000 (11:59 +0200)]
vec.h: Make some ops work with non-trivially copy constructible and/or destructible types
We have some very limited support for non-POD types in vec.h
(in particular grow_cleared will invoke default ctors on the
cleared elements and vector copying invokes copy ctors.
My pending work on wide_int/widest_int which makes those two
non-trivially default constructible, copyable and destructible shows this
isn't enough though.
In particular the uses of it in irange shows that quick_push
still uses just assignment operator rather than copy construction
and we never invoke destructors on anything.
The following patch does that for quick_push (copy construction
using placement new rather than assignment, for trivially copy
constructible types I think it should be the same) and invokes
destructors (only if non-trivially destructible) in pop, release
and truncate. Now as discussed last night on IRC, the pop case
is problematic, because our pop actually does two things,
it decreases length (so the previous last element should be destructed)
but also returns a reference to it. We have some 300+ uses of this
and the reference rather than returning it by value is useful at least
for the elements which are (larger) POD structures, so I'm not
prepared to change that. Though obviously for types with non-trivial
destructors returning a reference to just destructed element is not
a good idea. So, this patch for that case only makes pop return void
instead and any users wishing to get the last element need to use last ()
and pop () separately (currently there are none).
Note, a lot of vec.h operations is still not friendly for non-POD types,
and the patch tries to enforce that through static asserts. Some
operations are now only allowed on trivially copyable types, sorting
operations as an extension on trivially copyable types or std::pair
of 2 trivially copyable types, quick_grow/safe_grow (but not _cleared
variants) for now have a commented out assert on trivially default
constructible types - this needs some further work before the assert
can be enabled - and finally all va_gc/va_gc_atomic vectors require
trivially destructible types.
2023-09-28 Jakub Jelinek <jakub@redhat.com>
Jonathan Wakely <jwakely@redhat.com>
* vec.h: Mention in file comment limited support for non-POD types
in some operations.
(vec_destruct): New function template.
(release): Use it for non-trivially destructible T.
(truncate): Likewise.
(quick_push): Perform a placement new into slot
instead of assignment.
(pop): For non-trivially destructible T return void
rather than T & and destruct the popped element.
(quick_insert, ordered_remove): Note that they aren't suitable
for non-trivially copyable types. Add static_asserts for that.
(block_remove): Assert T is trivially copyable.
(vec_detail::is_trivially_copyable_or_pair): New trait.
(qsort, sort, stablesort): Assert T is trivially copyable or
std::pair with both trivally copyable types.
(quick_grow): Add assert T is trivially default constructible,
for now commented out.
(quick_grow_cleared): Don't call quick_grow, instead inline it
by hand except for the new static_assert.
(gt_ggc_mx): Assert T is trivially destructable.
(auto_vec::operator=): Formatting fixes.
(auto_vec::auto_vec): Likewise.
(vec_safe_grow_cleared): Don't call vec_safe_grow, instead inline
it manually and call quick_grow_cleared method rather than quick_grow.
(safe_grow_cleared): Likewise.
* edit-context.cc (class line_event): Move definition earlier.
* tree-ssa-loop-im.cc (seq_entry::seq_entry): Make default ctor
defaulted.
* ipa-fnsummary.cc (evaluate_properties_for_edge): Use
safe_grow_cleared instead of safe_grow followed by placement new
constructing the elements.
vmv.s.x has vl operand, the following code will get
avl (cosnt_int) from RVV Insn 1.
rtx avl = has_vl_op (insn->rtl ()) ? get_vl (insn->rtl ())
: dem.get_avl ();
If use REGNO for const_int, the compiler will crash:
during RTL pass: vsetvl
res_debug.c: In function '__dn_count_labels':
res_debug.c:1050:1: internal compiler error: RTL check: expected code 'reg',
have 'const_int' in rhs_regno, at rtl.h:1934
1050 | }
| ^
0x8fb169 rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char const*)
../.././gcc/gcc/rtl.cc:770
0x1399818 rhs_regno(rtx_def const*)
../.././gcc/gcc/rtl.h:1934
0x1399818 anticipatable_occurrence_p
../.././gcc/gcc/config/riscv/riscv-vsetvl.cc:348
So in this case avl should be obtained from dem.
Another issue is caused by the following code:
HOST_WIDE_INT diff = INTVAL (builder.elt (i)) - i;
during RTL pass: expand
../../.././gcc/libgfortran/generated/matmul_c4.c: In function 'matmul_c4':
../../.././gcc/libgfortran/generated/matmul_c4.c:2906:39: internal compiler error: RTL check:
expected code 'const_int', have 'const_poly_int' in expand_const_vector,
at config/riscv/riscv-v.cc:1149
The builder.elt (i) can be either const_int or const_poly_int.
This is a bug fix for commit a62c8324e7e31ae6614f549bdf9d8a653233f8fc,
which added GIMPLE_OMP_STRUCTURED_BLOCK. I found a big switch statement
over gimple codes that needs to know about this new node, but didn't.
gcc/ChangeLog
* gimple.cc (gimple_copy): Add case GIMPLE_OMP_STRUCTURED_BLOCK.
Darwin, configure: Allow for an unrecognisable dsymutil [PR111610].
We had a catch-all configuration case for missing or unrecognised dsymutil
but it was setting the dsymutil source to "UNKNOWN" which is not usable in
this context (since it clashes with an existing enum). We rename this to
DET_UNKNOWN (for Darwin External Toolchain).
PR target/111610
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Rename the missing dsymutil case to "DET_UNKNOWN".
aarch64: Fine-grained policies to control ldp-stp formation
This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
to provide the requested behaviour for handling ldp and stp:
/* Allow the tuning structure to disable LDP instruction formation
from combining instructions (e.g., in peephole2).
TODO: Implement fine-grained tuning control for LDP and STP:
1. control policies for load and store separately;
2. support the following policies:
- default (use what is in the tuning structure)
- always
- never
- aligned (only if the compiler can prove that the
load will be aligned to 2 * element_size) */
It provides two new and concrete target-specific command-line parameters
--param=aarch64-ldp-policy= and --param=aarch64-stp-policy=
to give the ability to control load and store policies seperately as
stated in part 1 of the TODO.
The accepted values for both parameters are:
* default: Use the policy of the tuning structure (default).
* always: Emit ldp/stp regardless of alignment.
* never: Do not emit ldp/stp.
* aligned: In order to emit ldp/stp, first check if the load/store will
be aligned to 2 * element_size.
Bootstrapped and regtested aarch64-linux.
gcc/ChangeLog:
* config/aarch64/aarch64-opts.h (enum aarch64_ldp_policy): New
enum type.
(enum aarch64_stp_policy): New enum type.
* config/aarch64/aarch64-protos.h (struct tune_params): Add
appropriate enums for the policies.
(aarch64_mem_ok_with_ldpstp_policy_model): New declaration.
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
options.
* config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
function to parse ldp-policy parameter.
(aarch64_parse_stp_policy): New function to parse stp-policy parameter.
(aarch64_override_options_internal): Call parsing functions.
(aarch64_mem_ok_with_ldpstp_policy_model): New function.
(aarch64_operands_ok_for_ldpstp): Add call to
aarch64_mem_ok_with_ldpstp_policy_model for parameter-value
check and alignment check and remove superseded ones.
(aarch64_operands_adjust_ok_for_ldpstp): Add call to
aarch64_mem_ok_with_ldpstp_policy_model for parameter-value
check and alignment check and remove superseded ones.
* config/aarch64/aarch64.opt (aarch64-ldp-policy): New param.
(aarch64-stp-policy): New param.
* doc/invoke.texi: Document the parameters accordingly.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/ampere1-no_ldp_combine.c: Removed.
* gcc.target/aarch64/ldp_aligned.c: New test.
* gcc.target/aarch64/ldp_always.c: New test.
* gcc.target/aarch64/ldp_never.c: New test.
* gcc.target/aarch64/stp_aligned.c: New test.
* gcc.target/aarch64/stp_always.c: New test.
* gcc.target/aarch64/stp_never.c: New test.
Andre Vieira [Wed, 27 Sep 2023 10:05:40 +0000 (11:05 +0100)]
vect, omp: inbranch simdclone dropping const
The const attribute is ignored when simdclone's are used inbranch. This is due
to the fact that when analyzing a MASK_CALL we were not looking at the targeted
function for flags, but instead only at the internal function call itself.
This patch adds code to make sure we look at the target function to check for
the const attribute and enables the autovectorization of inbranch const
simdclones without needing the loop to be adorned the 'openmp simd' pragma.
Jakub Jelinek [Wed, 27 Sep 2023 08:38:54 +0000 (10:38 +0200)]
remove workaround for GCC 4.1-4.3 [PR105606]
While looking into vec.h, I've noticed we still have a workaround for
GCC 4.1-4.3 bugs.
As we now use C++11 and thus need to be built by GCC 4.8 or later,
I think this is now never used.
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
After this patch:
...
fsrmi 0 // Rounding to nearest, ties to even
.L4:
vfabs.v v1,v2
vmflt.vf v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv v1,v1,v2
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases. We will add more run test with zfa support later.
gcc/ChangeLog:
* config/riscv/autovec.md (roundeven<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.
Then it ICE on: auto new_mode = smallest_int_mode_for_size (access_size * BITS_PER_UNIT);
The access_size may be 24 or 32. We don't have such integer modes with these size so it ICE.
TODO: The better way maybe make DSE use native_encode_rtx/native_decode_rtx
but I don't know how to do that. So let's quickly fix this issue, we
can improve the fix later.
PR target/111590
gcc/ChangeLog:
* dse.cc (find_shift_sequence): Check the mode with access_size exist on the target.
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
After this patch:
vfabs.v v2,v1
vmflt.vf v0,v2,fa5
vfcvt.rtz.x.f.v v4,v1,v0.t
vfcvt.f.x.v v2,v4,v0.t
vfsgnj.vv v2,v2,v1
bne .L4
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (btrunc<mode>2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_trunc): New func decl.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): New func impl.
(expand_vec_trunc): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c: New test.
__atomic_test_and_set: Fall back to library, not non-atomic code
Make __atomic_test_and_set consistent with other __atomic_ and __sync_
builtins: call a matching library function instead of emitting
non-atomic code when the target has no direct insn support.
There's special-case code handling targetm.atomic_test_and_set_trueval
!= 1 trying a modified maybe_emit_sync_lock_test_and_set. Previously,
if that worked but its matching emit_store_flag_force returned NULL,
we'd segfault later on. Now that the caller handles NULL, gcc_assert
here instead.
While the referenced PR:s are ARM-specific, the issue is general.
PR target/107567
PR target/109166
* builtins.cc (expand_builtin) <case BUILT_IN_ATOMIC_TEST_AND_SET>:
Handle failure from expand_builtin_atomic_test_and_set.
* optabs.cc (expand_atomic_test_and_set): When all attempts fail to
generate atomic code through target support, return NULL
instead of emitting non-atomic code. Also, for code handling
targetm.atomic_test_and_set_trueval != 1, gcc_assert result
from calling emit_store_flag_force instead of returning NULL.
testsuite: Require thread-fence for 29_atomics/atomic_flag/cons/value_init.cc
A recent patch made __atomic_test_and_set no longer fall
back to emitting non-atomic code, but instead will then emit
a call to __atomic_test_and_set, thereby exposing the need
to gate also this test on support for atomics, similar to r14-3980-g62b29347c38394.
These are tests from patch 3/5 of Ziao Zeng's zicond submission.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zicond-primitiveSemantics_return_0_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_imm.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_imm_reg.c: New test.
* gcc.target/riscv/zicond-primitiveSemantics_return_reg_reg.c: New test.
Gaius Mulley [Tue, 26 Sep 2023 17:08:37 +0000 (18:08 +0100)]
PR modula2/111510 runtime ICE findChildAndParent has caused internal runtime error
This patch fixes the runtime bug above. The full runtime message is:
findChildAndParent has caused internal runtime error, RTentity is either
corrupt or the module storage has not been initialized yet. The bug is
due to a non nul terminated string determining the module initialization order.
This results in modules being uninitialized and the above crash. The bug
manifests itself on 32 bit systems - but obviously is latent on all
targets and the fix should be applied to both gcc-14 and gcc-13.
gcc/m2/ChangeLog:
PR modula2/111510
* gm2-compiler/M2GenGCC.mod (IsExportedGcc): Minor spacing changes.
(BuildTrashTreeFromInterface): Minor spacing changes.
* gm2-compiler/M2Options.mod (GetRuntimeModuleOverride): Call
string to generate a nul terminated C style string.
* gm2-compiler/M2Quads.mod (BuildStringAdrParam): New procedure.
(BuildM2InitFunction): Replace inline parameter generation with
calls to BuildStringAdrParam.