Nathaniel Shead [Sat, 17 Aug 2024 12:37:30 +0000 (22:37 +1000)]
c++/modules: Disable streaming definitions of non-vague-linkage GMF decls [PR115020]
The error in the linked PR is caused because 'DECL_THIS_STATIC' is true
for the static member function, causing the streaming code to assume
that this is an internal linkage GM entity that needs to be explicitly
streamed, which then on read-in gets marked as a vague linkage function
(despite being non-inline) causing import_export_decl to complain.
However, I don't see any reason why we should care about this:
definitions in the GMF should just be emitted as per usual regardless of
whether they're internal-linkage or not. Actually the only thing we
care about here are header modules, since they have no TU to write
definitions into. As such this patch removes these conditions from
'has_definition' and updates some comments to clarify.
PR c++/115020
gcc/cp/ChangeLog:
* module.cc (has_definition): Only force writing definitions for
header_module_p.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr115020_a.C: New test.
* g++.dg/modules/pr115020_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Sun, 18 Aug 2024 01:36:40 +0000 (11:36 +1000)]
c++/modules: Handle transitive reachability for deduction guides [PR116403]
Currently we implement [temp.deduct.guide] p1 by forcing all deduction
guides to be considered as exported. However this is not sufficient:
for transitive non-exported imports we will still hide the deduction
guide from name lookup, causing errors.
This patch instead adjusts name lookup to have a new ANY_REACHABLE flag
to allow for this case. Currently this is only used by deduction guides
but there are some other circumstances where this may be useful in the
future (e.g. finding existing temploid friends).
PR c++/116403
gcc/cp/ChangeLog:
* pt.cc (deduction_guides_for): Use ANY_REACHABLE for lookup of
deduction guides.
* module.cc (depset::hash::add_deduction_guides): Likewise.
(module_state::write_cluster): No longer override deduction
guides as exported.
* name-lookup.cc (name_lookup::search_namespace_only): Ignore
visibility when LOOK_want::ANY_REACHABLE is specified.
(check_module_override): Ignore visibility when checking for
ambiguating deduction guides.
* name-lookup.h (LOOK_want): New flag 'ANY_REACHABLE'.
gcc/testsuite/ChangeLog:
* g++.dg/modules/dguide-4_a.C: New test.
* g++.dg/modules/dguide-4_b.C: New test.
* g++.dg/modules/dguide-4_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Fri, 16 Aug 2024 05:06:33 +0000 (15:06 +1000)]
c++/modules: Avoid rechecking initializers when streaming NTTPs [PR116382]
When reading an NTTP we call get_template_parm_object which delegates
setting of DECL_INITIAL to the general cp_finish_decl procedure, which
calls check_initializer to validate and record it.
Apart from being unnecessary (it must have already been validated by the
writing module), this also causes errors in cases like the linked PR, as
validating may end up needing to call lazy_load_pendings to determine
any specialisations that may exist which violates assumptions of the
modules streaming code.
This patch works around the issue by adding a flag to
get_template_parm_object to disable these checks when not needed.
PR c++/116382
gcc/cp/ChangeLog:
* cp-tree.h (get_template_parm_object): Add check_init param.
* module.cc (trees_in::tree_node): Pass check_init=false when
building NTTPs.
* pt.cc (get_template_parm_object): Prevent cp_finish_decl from
validating the initializer when check_init=false.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-nttp-1_a.C: New test.
* g++.dg/modules/tpl-nttp-1_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Thu, 15 Aug 2024 11:46:09 +0000 (21:46 +1000)]
c++/modules: Fix type lookup in DECL_TEMPLATE_INSTANTIATIONS [PR116364]
We need to use the DECL_TEMPLATE_INSTANTIATIONS property to find
reachable specialisations from a template to ensure that any GM
specialisations are properly marked as reachable.
Currently the modules code uses the decl when rebuilding this property,
but this is not always correct; it appears that for type specialisations
we need to use the TREE_TYPE of the decl instead so that the
specialisation is correctly found. This patch makes the required
adjustments.
PR c++/116364
gcc/cp/ChangeLog:
* cp-tree.h (get_mergeable_specialization_flags): Adjust
signature.
* module.cc (trees_out::decl_value): Indicate whether this is a
type or decl specialisation.
* pt.cc (get_mergeable_specialization_flags): Match against the
type of a non-decl specialisation.
(add_mergeable_specialization): Use the already calculated spec
instead of always adding decl to DECL_TEMPLATE_INSTANTIATIONS.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-spec-9_a.C: New test.
* g++.dg/modules/tpl-spec-9_b.C: New test.
* g++.dg/modules/tpl-spec-9_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Andreas Schwab [Mon, 19 Aug 2024 18:59:13 +0000 (20:59 +0200)]
m68k: Add -mlra
PR target/113939
* config/m68k/m68k.opt (mlra): New target option.
* config/m68k/m68k.cc (m68k_use_lra_p): New function.
(TARGET_LRA_P): Use it.
* config/m68k/m68k.opt.urls: Regenerate.
Marek Polacek [Thu, 15 Aug 2024 22:47:29 +0000 (18:47 -0400)]
c++: ICE with enum and conversion fn in template [PR115657]
Here we initialize an enumerator with a class prvalue with a conversion
function. When we fold it in build_enumerator, we create a TARGET_EXPR
for the object, and subsequently crash in tsubst_expr, which should not
see such a code.
Normally, we fix similar problems by using an IMPLICIT_CONV_EXPR but here
I may get away with not using the result of fold_non_dependent_expr unless
the result is a constant. A TARGET_EXPR is not constant.
PR c++/115657
gcc/cp/ChangeLog:
* decl.cc (build_enumerator): Call maybe_fold_non_dependent_expr
instead of fold_non_dependent_expr.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/constexpr-recursion2.C: New test.
* g++.dg/template/conv21.C: New test.
Marek Polacek [Thu, 15 Aug 2024 15:53:10 +0000 (11:53 -0400)]
c++: fix ICE in convert_nontype_argument [PR116384]
Here we ICE since r14-8291 in C++11/C++14 modes. Fortunately
this is an easy one.
The important bit of r14-8291 is this:
@@ -20056,9 +20071,12 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
RETURN (retval);
}
if (IMPLICIT_CONV_EXPR_NONTYPE_ARG (t))
- /* We'll pass this to convert_nontype_argument again, we don't need
- to actually perform any conversion here. */
- RETURN (expr);
+ {
+ tree r = convert_nontype_argument (type, expr, complain);
+ if (r == NULL_TREE)
+ r = error_mark_node;
+ RETURN (r);
+ }
which obviously means that instead of returning right away we go
to convert_nontype_argument. When type is error_mark_node and we're
in C++17, in convert_nontype_argument we go down this path:
Andrew Carlotti [Thu, 26 Oct 2023 14:45:15 +0000 (15:45 +0100)]
aarch64: Fix ls64 intrinsic availability
The availability of ls64 intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.
This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. We also get better error
messages when ls64 is not available (matching the existing error
messages for SVE intrinsics).
The data512_t type is made always available; this is consistent with the
present behaviour for Neon fp16/bf16 types.
gcc/ChangeLog:
PR target/112108
* config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
feature check at initialisation.
(aarch64_general_check_builtin_call): Check ls64 intrinsics.
* config/aarch64/arm_acle.h: (data512_t) Make always available.
gcc/testsuite/ChangeLog:
PR target/112108
* gcc.target/aarch64/acle/ls64_guard-1.c: New test.
* gcc.target/aarch64/acle/ls64_guard-2.c: New test.
* gcc.target/aarch64/acle/ls64_guard-3.c: New test.
* gcc.target/aarch64/acle/ls64_guard-4.c: New test.
Andrew Carlotti [Tue, 18 Jul 2023 19:09:38 +0000 (20:09 +0100)]
aarch64: Fix memtag intrinsic availability
The availability of memtag intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.
This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. It also removes the macro
indirection from the header file - this simplifies the header, and
allows the missing extension error reporting to find the user-facing
intrinsic names.
PR target/112108
* gcc.target/aarch64/acle/memtag_guard-1.c: New test.
* gcc.target/aarch64/acle/memtag_guard-2.c: New test.
* gcc.target/aarch64/acle/memtag_guard-3.c: New test.
* gcc.target/aarch64/acle/memtag_guard-4.c: New test.
Andrew Carlotti [Thu, 26 Oct 2023 14:43:44 +0000 (15:43 +0100)]
aarch64: Fix tme intrinsic availability
The availability of tme intrinsics was previously gated at both
initialisation time (using global target options) and usage time
(accounting for function-specific target options). This patch removes
the check at initialisation time, and also moves the intrinsics out of
the header file to allow for better error messages (matching the
existing error messages for SVE intrinsics).
PR target/112108
* gcc.target/aarch64/acle/tme_guard-1.c: New test.
* gcc.target/aarch64/acle/tme_guard-2.c: New test.
* gcc.target/aarch64/acle/tme_guard-3.c: New test.
* gcc.target/aarch64/acle/tme_guard-4.c: New test.
Andrew Carlotti [Tue, 13 Aug 2024 15:15:11 +0000 (16:15 +0100)]
aarch64: Refactor check_required_extensions
Replace TARGET_GENERAL_REGS_ONLY check with an explicit check that
aarch64_isa_flags enables all required extensions. This will be more
flexible when repurposing this function for non-SVE intrinsics.
Fix ICE when scalar coarrays are used in a select type. Prevent
coindexing in associate/select type/select rank selector expression.
gcc/fortran/ChangeLog:
PR fortran/46371
PR fortran/56496
* expr.cc (gfc_is_coindexed): Detect is coindexed also when
rewritten to caf_get.
* trans-stmt.cc (trans_associate_var): Always accept a
descriptor for coarrays.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/select_type_1.f90: New test.
* gfortran.dg/coarray/select_type_2.f90: New test.
* gfortran.dg/coarray/select_type_3.f90: New test.
Arsen Arsenović [Thu, 15 Aug 2024 17:17:41 +0000 (19:17 +0200)]
gnat: fix lto-type-mismatch between C_Version_String and gnat_version_string [PR115917]
gcc/ada/ChangeLog:
PR ada/115917
* gnatvsn.ads: Add note about the duplication of this value in
version.c.
* version.c (VER_LEN_MAX): Define to the same value as
Gnatvsn.Ver_Len_Max.
(gnat_version_string): Use VER_LEN_MAX as bound.
Kyrylo Tkachov [Fri, 2 Aug 2024 13:48:47 +0000 (06:48 -0700)]
aarch64: Reduce FP reassociation width for Neoverse V2 and set AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA
The fp reassociation width for Neoverse V2 was set to 6 since its
introduction and I guess it was empirically tuned. But since
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA was added the tree reassociation
pass seems to be more deliberate in forming FMAs and when that flag is
used it seems to more properly evaluate the FMA vs non-FMA reassociation
widths.
According to the Neoverse V2 SWOG the core has a throughput of 4 for
most FP operations, so the value 6 is not accurate anyway.
Also, the SWOG does state that FMADD operations are pipelined and the
results can be forwarded from FP multiplies to the accumulation operands
of FMADD instructions, which seems to be what
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA expresses.
This patch sets the fp_reassoc_width field to 4 and enables
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA for -mcpu=neoverse-v2.
On SPEC2017 fprate I see the following changes on a Grace system:
503.bwaves_r 0.16%
507.cactuBSSN_r -0.32%
508.namd_r 3.04%
510.parest_r 0.00%
511.povray_r 0.78%
519.lbm_r 0.35%
521.wrf_r 0.69%
526.blender_r -0.53%
527.cam4_r 0.84%
538.imagick_r 0.00%
544.nab_r -0.97%
549.fotonik3d_r -0.45%
554.roms_r 0.97%
Geomean 0.35%
with -Ofast -mcpu=grace -flto.
So slight overall improvement with a meaningful improvement in
508.namd_r.
I think other tunings in aarch64 should look into
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA as well, but I'll leave the
benchmarking to someone else.
Andre Vieira [Mon, 19 Aug 2024 08:38:41 +0000 (09:38 +0100)]
rtl: Enable the use of rtx values with int and mode attributes
The 'code' part of a 'define_code_attr' refers to the type of the key, in other
words, it uses a code_iterator to pick the 'value' from their (key "value") pair
list.
However, rtx_alloc_for_name requires a code_attribute to be used when the
'value' needs to be a type. In other words, no other type of attributes could be
used, before this patch, to produce a rtx typed 'value'.
This patch removes that restriction and allows the backend to use any kind of
attribute as long as that attribute always produces a valid code typed 'value'.
gcc/ChangeLog:
* read-rtl.cc (rtx_reader::rtx_alloc_for_name): Allow all attribute
types to produce code 'values'.
(check_code_attribute): Rename ...
(check_attribute_codes): ... to this. And change comments to refer to
* doc/md.texi: Add paragraph to document that you can use int and mode
attributes to produce codes.
testsuite: Reduce cut-&-paste in scanltranstree.exp
scanltranstree.exp defines some LTO wrappers around standard
non-LTO scanners. Four of them are cut-&-paste variants of
one another, so this patch generates them from a single template.
It also does the same for scan-ltrans-tree-dump-times, so that
other *-times scanners can be added easily in future.
The scanners seem to be lightly used. gcc.dg/ipa/ipa-icf-38.c uses
scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
uses scan-ltrans-tree-dump-{not,times}. Nothing currently seems
to use scan-ltrans-tree-dump-dem*.
gcc/testsuite/
* lib/scanltranstree.exp: Redefine the routines using two
templates.
Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535 [PR84244]
Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.
PR fortran/84244
gcc/fortran/ChangeLog:
* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
generated for a component, link it to the component it is
generated for (the previous one).
Hu, Lin1 [Mon, 19 Aug 2024 02:08:53 +0000 (10:08 +0800)]
AVX10.2 ymm rounding: Support vcvtdq2p{s,h} and vcvtpd2p{s,h} intrins
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SI_V8SF_UQI_INT, V4SF_FTYPE_V4DF_V4SF_UQI_INT,
V8HF_FTYPE_V8SI_V8HF_UQI_INT, V8HF_FTYPE_V4DF_V8HF_UQI_INT.
* config/i386/sse.md:
(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>):
Add condition check.
(avx512fp16_vcvtpd2ph_v4df_mask_round): New expand.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Change name to
avx512fp16_vcvt<castmode>2ph_<mode>_mask<round_name>_1
and extend pattern to generate 256bit insns.
(avx_cvtpd2ps256<mask_name>): Change name to
avx_cvtpd2ps256<mask_name><round_name> and extend pattern to
generate 256bit insns.
* config/i386/subst.md (round_applied): New condition.
(round_suff): New iterator.
(round_mode_condition): Add V32HI check for 512bit.
(round_saeonly_mode_condition): Ditto.
Hu, Lin1 [Mon, 19 Aug 2024 02:08:51 +0000 (10:08 +0800)]
AVX10.2 ymm rounding: Support vadd{s,d,h} and vcmp{s,d,h} intrins
gcc/ChangeLog:
* config.gcc: Add avx10_2roundingintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT, V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT, UQI_FTYPE_V4DF_V4DF_INT_UQI_INT,
UHI_FTYPE_V16HF_V16HF_INT_UHI_INT, UQI_FTYPE_V8SF_V8SF_INT_UQI_INT.
* config/i386/immintrin.h: Include avx10_2roundingintrin.h.
* config/i386/sse.md: Change subst_attr name due to renaming.
* config/i386/subst.md:
(<round_mode512bit_condition>): Add condition check for avx10.2
rounding control 256bit intrins and renamed to ...
(<round_mode_condition>): ...this.
(round_saeonly_mode512bit_condition): Add condition check for
avx10.2 rounding control 256 bit intris and renamed to ...
(round_saeonly_mode_condition): ...this.
* config/i386/avx10_2roundingintrin.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add -mavx10.2 and new builtin test.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-13.c: Add new tests.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: New test.
Jeff Law [Sun, 18 Aug 2024 22:55:52 +0000 (16:55 -0600)]
[PR rtl-optimization/115876] Avoid ubsan in ext-dce.cc
This fixes two general ubsan issues in ext-dce, both related to use-side
processsing of modes > DImode.
In ext_dce_process_uses we can be presented with something like this as a use
(subreg:SI (reg:TF) 12)
That will result in an out of range shift for a HOST_WIDE_INT object. Where
this happens is safe to just break from the SET context and process the
subjects. This will ultimately result in seeing (reg:TF) and we'll mark all
bit groups as live.
In carry_backpropagate we can be presented with a TImode shift (for example)
and the shift count can be > 63 for such a shift. This naturally trips ubsan
as well as we're operating on 64 bit objects.
We can just return mmask in this case noting that every bit group is live.
The combination of these two fixes eliminates all the reported ubsan issues in
ext-dce seen in a bootstrap and regression test on x86.
While I was in there I went ahead and fixed the various hardcoded 63/64 values
to be HOST_BITS_PER_WIDE_INT based.
Bootstrapped and regression tested on x86 with no regressions. Also built with
ubsan enabled and verified the build logs and testsuite logs don't call out any
issues in ext-dce anymore.
Pushing to the trunk.
PR rtl-optimization/115876
gcc
* ext-dce.cc (ext_dce_process_sets): Replace hardcoded 63/64 instances
with HOST_BITS_PER_WIDE_INT based values.
(carry_backpropagate): Handle modes with more bits than
HOST_BITS_PER_WIDE_INT gracefully, avoiding undefined behavior.
(ext_dce_process_uses): Handle subreg offsets which would result
in ubsan shifts gracefully, avoiding undefined behavior.
Andrew Pinski [Sat, 20 Apr 2024 07:13:12 +0000 (00:13 -0700)]
PHIOPT: move factor_out_conditional_operation over to use gimple_match_op
To start working on more with expressions with more than one operand, converting
over to use gimple_match_op is needed.
The added side-effect here is factor_out_conditional_operation can now support
builtins/internal calls that has one operand without any extra code added.
Note on the changed testcases:
* pr87007-5.c: the test was testing testing for avoiding partial register stalls
for the sqrt and making sure there is only one zero of the register before the
branch, the phiopt would now merge the sqrt's so disable phiopt.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* gimple-match-exports.cc (gimple_match_op::operands_occurs_in_abnormal_phi):
New function.
* gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi.
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Use gimple_match_op
instead of manually extracting from/creating the gimple.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr87007-5.c: Disable phi-opt.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Sergey Fedorov [Sun, 18 Aug 2024 16:52:51 +0000 (18:52 +0200)]
libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic
This allows to build and use IEEE modules on Darwin PowerPC.
libgfortran/ChangeLog:
* config/fpu-macppc.h (new file): initial support for powerpc-darwin.
* configure.host: enable ieee_support for powerpc-darwin case,
set fpu_host='fpu-macppc'.
Georg-Johann Lay [Sun, 18 Aug 2024 16:26:16 +0000 (18:26 +0200)]
AVR: Tweak 16-bit addition with const that didn't get a LD_REGS register.
The 16-bit additions like addhi3 have two forms: One with a scratch:QI
and one without, where the latter is required because reload cannot
deal with a scratch when spill code pops a 16-bit addition.
Passes like combine and fwprop1 may come up with the non-scratch version,
which is sub-optimal in the case when the addition is performed in a
NO_LD_REGS register because the operands will be spilled to LD_REGS.
Having a scratch:QI at disposal can lead to better code with less spills.
gcc/
* config/avr/avr.md (*add<mode>3_split) [!reload_completed]:
Add a scratch:QI to 16-bit additions with constant.
Georg-Johann Lay [Sun, 18 Aug 2024 13:00:55 +0000 (15:00 +0200)]
AVR: target/116407 - Fix linker error "relocation truncated to fit".
Some text peepholes output extra instructions prior to a branch
instruction and that increase the jump offset of backward branches.
PR target/116407
gcc/
* config/avr/avr-protos.h (avr_jump_mode): Add an int argument.
* config/avr/avr.cc (avr_jump_mode): Add an int argument to increase
the computed jump offset of backwards branches.
* config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1):
Increase the jump offset used by avr_jump_mode() as needed.
gcc/testsuite/
* gcc.target/avr/torture/pr116407-2.c: New test.
* gcc.target/avr/torture/pr116407-4.c: New test.
Andrew Pinski [Sat, 17 Aug 2024 19:14:54 +0000 (12:14 -0700)]
forwprop: Also dce from added statements from gimple_simplify
This extends r14-3982-g9ea74d235c7e78 to also include the newly added statements
since some of them might be dead too (due to the way match and simplify works).
This was noticed while working on adding a new match and simplify pattern where a
new statement that got added was not being used.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* gimple-fold.cc (mark_lhs_in_seq_for_dce): New function.
(replace_stmt_with_simplification): Call mark_lhs_in_seq_for_dce
right before inserting the sequence.
(fold_stmt_1): Add dce_worklist argument, update call to
replace_stmt_with_simplification.
(fold_stmt): Add dce_worklist argument, update call to fold_stmt_1.
(fold_stmt_inplace): Update call to fold_stmt_1.
* gimple-fold.h (fold_stmt): Add bitmap argument.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Update call to fold_stmt.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc
gcc/ChangeLog:
* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc<mode><anyi_quad_truncated>2):
Add new pattern for quad truncation.
(ustrunc<mode><anyi_oct_truncated>2): Ditto but for oct.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.
The convert from signed char to unsigned short will have sign_extend rtl
as above. And finally become the lb insn as below:
lb a1,0(a5) // a1 is -40, aka 0xffffffffffffffd8
lui a0,0x1a
addi a5,a1,9
slli a5,a5,0x30
srli a5,a5,0x30 // a5 is 65505
sltu a1,a5,a1 // compare 65505 and 0xffffffffffffffd8 => TRUE
The sltu try to compare 65505 and 0xffffffffffffffd8 here, but we
actually want to compare 65505 and 65496 (0xffd8). Thus we need to
clean up the high bits to ensure this.
The below test suites are passed for this patch:
* The rv64gcv fully regression test.
PR target/116278
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands 0
and remove the special handing for SImode in RV64.
Pan Li [Sat, 17 Aug 2024 11:27:11 +0000 (19:27 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 3
This patch would like to add test cases for the unsigned scalar
.SAT_TRUNC form 3. Aka:
Form 3:
#define DEF_SAT_U_TRUC_FMT_3(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_3 (WT x) \
{ \
WT max = (WT)(NT)-1; \
return x <= max ? (NT)x : (NT) max; \
}
DEF_SAT_U_TRUC_FMT_3 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-13.c: New test.
* gcc.target/riscv/sat_u_trunc-14.c: New test.
* gcc.target/riscv/sat_u_trunc-15.c: New test.
* gcc.target/riscv/sat_u_trunc-run-13.c: New test.
* gcc.target/riscv/sat_u_trunc-run-14.c: New test.
* gcc.target/riscv/sat_u_trunc-run-15.c: New test.
Pan Li [Sat, 17 Aug 2024 10:04:00 +0000 (18:04 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 2
This patch would like to add test cases for the unsigned scalar
.SAT_TRUNC form 2. Aka:
Form 2:
#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
{ \
WT max = (WT)(NT)-1; \
return x > max ? (NT) max : (NT)x; \
}
DEF_SAT_U_TRUC_FMT_2 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-7.c: New test.
* gcc.target/riscv/sat_u_trunc-8.c: New test.
* gcc.target/riscv/sat_u_trunc-9.c: New test.
* gcc.target/riscv/sat_u_trunc-run-7.c: New test.
* gcc.target/riscv/sat_u_trunc-run-8.c: New test.
* gcc.target/riscv/sat_u_trunc-run-9.c: New test.
Jeff Law [Sat, 17 Aug 2024 21:10:38 +0000 (15:10 -0600)]
[committed] Avoid right shifting signed value on ext-dce.cc
This is analogous to a prior patch to ext-dce which fixes propagation of sign
bits, but this time for the saturating variants. I'd held off fixing those
because I wanted the time to look at that code (since we don't have a testcase
for it as far as I know).
Not surprisingly, putting an abort on that path and running an x86 bootstrap
and testsuite run, it never triggers. Of course not a lot of code tries to do
saturating shifts.
Anyway, bootstrapped and regression tested on x86_64. Pushing to the trunk.
Thanks for everyone's patience.
gcc/
* ext-dce.cc (carry_backpropagate): Cast mask to HOST_WIDE_INT before
shifting.
Kevin Kirspel [Sat, 17 Aug 2024 20:37:18 +0000 (14:37 -0600)]
t-rtems: add rv32imf architecture to the RTEMS multilib for RISC-V
The attach patch is specific to the RTEMS RISC-V architecture multilib which is
controlled by the t-rtems file in the gcc/config/riscv/ directory. The patch
file was created from the gcc-13.3.0 branch. It was successfully tested within
RTEMS Source Builder.
Jeff Law [Sat, 17 Aug 2024 16:30:48 +0000 (10:30 -0600)]
Adjust v850 rotate expander to allow more cases for V850E3V5
The recent if-conversion changes tripped a failure on the v850 port.
The core underlying issue is that while the if-conversion code tries to do the
right thing with noce_can_force_operand to determine if it can force an
arbitrary operand into a register, it's not really a sufficient check.
Essentially for arithmetic codes, it checks the operands. If the operands are
force-able and there's a code_to_optab mapping, then it returns true.
code_to_optab doesn't actually check anything other than the existence of a
mapping in the target. If the target pattern has restrictions enforced by the
condition or it's an expander that is allowed to FAIL, then
noce_can_force_operand to be true, even though we may not be able to directly
force the operand into a register.
This came up on the v850 when we had an operand that was a rotate by a constant
number of bits (I don't remember the count, all that's important about it was
the count was not 8 or 16).
The v850 port has this define_expand:
> (define_expand "rotlsi3"
> [(parallel [(set (match_operand:SI 0 "register_operand" "")
> (rotate:SI (match_operand:SI 1 "register_operand" "")
> (match_operand:SI 2 "const_int_operand" "")))
> (clobber (reg:CC CC_REGNUM))])]
> "(TARGET_V850E_UP)"
> {
> if (INTVAL (operands[2]) != 16)
> FAIL;
> })
So the only rotate count allowed is 16 (there's a similar HI rotate with a count of 8). AFAICT the rotate patterns are allowed to FAIL. So naturally the expander fails and we get a testsuite regression:
> Tests that now fail, but worked before (4 tests):
>
> v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
> v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
> v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
> v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
This patch works around the problem by allowing the rotates in additional
cases, particularly for the V850E3V5+ variants which have a general rotate
capability. But let's be clear, this is just a workaround and I expect we're
going to have to revisit the code to test if an operand can be forced into a
register.
gcc/
* config/v850/v850.md (rotlsi3): Allow more cases for V850E3V5+.
Jeff Law [Sat, 17 Aug 2024 15:52:55 +0000 (09:52 -0600)]
[RISC-V][PR target/116282] Stabilize pattern conditions
So as expected the core problem with target/116282 is that the cost of certain
constant synthesis cases varied depending on whether or not we're allowed to
generate new pseudos or not.
That in turn meant that in obscure cases an insn might change from recognizable
to unrecognizable and triggers the observed failure.
So we need to keep the cost stable, at least when called from a pattern's
condition. So we pass another boolean down when necessary. I've tried to keep
API fallout minimized.
Built and tested on rv32 in my tester. Let's see what pre-commit testing has
to say though 🙂
Note this will also require a minor change to the in-flight constant synthesis
work.
PR target/116282
gcc/
* config/riscv/riscv-protos.h (riscv_const_insns): Add new argument.
* config/riscv/riscv.cc (riscv_build_integer): Add new argument
ALLOW_NEW_PSEUDOS. Pass it down to recursive calls and check it
before using synthesis which allows new registers to be created.
(riscv_split_integer_cost): Pass new argument to riscv_build_integer.
(riscv_integer_cost): Add ALLOW_NEW_PSEUDOS argument, pass it down to
riscv_build_integer.
(riscv_legitimate_constant_p): Pass new argument to riscv_const_insns.
(riscv_const_insns): New argment ALLOW_NEW_PSEUDOS. Pass it down to
riscv_integer_cost and riscv_const_insns.
(riscv_split_const_insns): Pass new argument to riscv_const_insns.
(riscv_move_integer, riscv_rtx_costs): Similarly.
* config/riscv/riscv.md (shadd with costly constant): Pass new argument
to riscv_const_insns.
* config/riscv/bitmanip.md (and with costly constant): Pass new argument
to riscv_const_insns.
gcc/testsuite/
* gcc.target/riscv/pr116282.c: New test.
Jin Ma [Sat, 17 Aug 2024 15:29:11 +0000 (09:29 -0600)]
RISC-V: Bugfix for RVV rounding intrinsic ICE in function checker
When compiling an interface for rounding of type 'vfloat16*' without using zvfh
or zvfhmin, it is not enough to use FLOAT_MODE_P because the type does not
support it. Although the subsequent riscv_validate_vector_type checks will
still fail and throw exceptions, I don't think we should have ICE here.
Pan Li [Sat, 17 Aug 2024 15:25:58 +0000 (09:25 -0600)]
RISC-V: Bugfix incorrect operand for vwsll auto-vect
This patch would like to fix one ICE when rv64gcv_zvbb for vwsll.
Consider below example.
void vwsll_vv_test (short *restrict dst, char *restrict a,
int *restrict b, int n)
{
for (int i = 0; i < n; i++)
dst[i] = a[i] << b[i];
}
It will hit the vwsll pattern with following operands.
operand 0 -> (reg:RVVMF2HI 146 [ vect__7.13 ])
operand 1 -> (reg:RVVMF4QI 165 [ vect_cst__33 ])
operand 2 -> (reg:RVVM1SI 171 [ vect_cst__36 ])
According to the ISA, operand 2 should be the same as operand 1.
Aka operand 2 should have RVVMF4QI mode as above. Thus, add
quad truncation for operand 2 before emit vwsll.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/116280
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Add quad truncation to
align the mode requirement for vwsll.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr116280-1.c: New test.
* gcc.target/riscv/rvv/base/pr116280-2.c: New test.
Feng Wang [Sat, 17 Aug 2024 14:40:42 +0000 (08:40 -0600)]
RISC-V: Add auto-vect pattern for vector rotate shift
This patch add the vector rotate shift pattern for auto-vect.
With this patch, the scalar rotate shift can be automatically
vectorized into vector rotate shift.
gcc/ChangeLog:
* config/riscv/autovec.md (v<bitmanip_optab><mode>3):
Add new define_expand pattern for vector rotate shift.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test.
曾治金 [Wed, 14 Aug 2024 06:06:23 +0000 (14:06 +0800)]
RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]
This patch is to fix the bug (BugId:116305) introduced by the commit
bd93ef for risc-v target.
The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128
if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So
it changes the value of BYTES_PER_RISCV_VECTOR. For example, before
merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value
of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value
of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer
equal.
Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb
register value in riscv_legitimize_poly_move, and dwarf2cfi will also
get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value
to calculate the number of times to multiply the vlenb register value.
So need to change the factor from riscv_bytes_per_vector_chunk to
BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf
information. The incorrect example as follow:
The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means
the literal 4, '0x1e' means the multiply operation. But in fact, the
vlenb register value just need to multiply the literal 2.
PR target/116305
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Take
BYTES_PER_RISCV_VECTOR for *factor instead of riscv_bytes_per_vector_chunk.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test.
Mark Harmstone [Thu, 8 Aug 2024 01:38:54 +0000 (02:38 +0100)]
Write CodeView information about stack variables
Outputs CodeView S_REGREL32 symbols for unoptimized local variables that
are stored on the stack. This includes a change to dwarf2out.cc to make
it easier to extract the function frame base without having to worry
about the function prologue or epilogue.
gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_REGREL32.
(write_fbreg_variable): New function.
(write_unoptimized_local_variable): Add fblock parameter, and handle
DW_OP_fbreg locations.
(write_unoptimized_function_vars): Add fbloc parameter.
(write_function): Extract frame base from DWARF.
* dwarf2out.cc (convert_cfa_to_fb_loc_list): Output simplified frame
base information for CodeView.
Mark Harmstone [Sat, 13 Jul 2024 20:32:40 +0000 (21:32 +0100)]
Write CodeView information about local static variables
Outputs CodeView S_LDATA32 symbols, for static variables within
functions, along with S_BLOCK32 and S_END for the beginning and end of
lexical blocks.
gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_END and S_BLOCK32.
(write_local_s_ldata32): New function.
(write_unoptimized_local_variable): New function.
(write_s_block32): New function.
(write_s_end): New function.
(write_unoptimized_function_vars): New function.
(write_function): Call write_unoptimized_function_vars.
Mark Harmstone [Mon, 12 Aug 2024 22:19:55 +0000 (23:19 +0100)]
Fix maybe-uninitialized CodeView LF_INDEX warning
Initialize last_type to 0 to silence two spurious maybe-uninitialized warnings.
We issue an LF_INDEX continuation subtype for any LF_FIELDLISTs that
overflow, so LF_INDEXes will always have a subtype preceding them (and
thus last_type will always be set).
gcc/
* dwarf2codeview.cc (get_type_num_enumeration_type): Initialize last_type
to 0.
(get_type_num_struct): Likewise.
Gaius Mulley [Fri, 16 Aug 2024 15:28:55 +0000 (16:28 +0100)]
modula2: change identifier names to avoid build warnings
This fix avoids the following warnings: In implementation module
‘StdChans’: either the identifier has the same name as a keyword or
alternatively a keyword has the wrong case (‘IN’ and ‘in’)
54 | stdnull: ChanId ;
the symbol name ‘in’ is legal as an identifier, however as such it
might cause confusion and is considered bad programming practice.
gcc/m2/ChangeLog:
* gm2-libs-iso/StdChans.mod (in): Rename to ...
(inch): ... this.
(out): Rename to ...
(outch): ... this.
(err): Rename to ...
(errch): ... this.
Gaius Mulley [Fri, 16 Aug 2024 15:09:53 +0000 (16:09 +0100)]
Fix using keywords as identifiers to prevent warnings during build
m2pim/DynamicStrings.mod:1358:27: note: In procedure ‘Slice’: the symbol
name ‘end’ is legal as an identifier, however as such it might cause
confusion and is considered bad programming practice
1358 | start, end, o: INTEGER ;
m2pim/DynamicStrings.mod:1358:27: note: either the identifier has the
same name as a keyword or alternatively a keyword has the wrong case
(‘END’ and ‘end’).
gcc/m2/ChangeLog:
* gm2-libs/DynamicStrings.mod (Slice): Rename end to stop.
testsuite: Verify -fshort-enums and -fno-short-enums in pr33738.C
For some targets, like Cortex-M on arm-none-eabi, the -fshort-enums is
enabled by default. For these targets, the test case fails as
sizeof(Alpha) < sizeof(int).
To make the test case behave identical for targets that does enable
-fshort-enums and those that does not, add -fno-short-enums in the test
case and verify that the warning is not emitted. Then also create a copy
and run the test with -fshort-enums and verify that the warning is
emitted.
Regtested on x86_64-pc-linux-gnu and arm-none-eabi.
gcc/testsuite/ChangeLog:
* g++.dg/warn/pr33738.C: Added -fno-short-enums.
* g++.dg/warn/pr33738-2.C: Duplicate g++.dg/warn/pr33738.C with
-fshort-enums and removed xfail.
The test case assumes that sizeof(tree_code) >= 2. On some targets, like
Cortex-M on arm-none-eabi, -fshort-enums is enabled by default and in
that case, sizeof(tree_code) will be 1 and the following warning is
emitted:
.../pr97315-1.C:8:13: warning: width of 'tree_base::code' exceeds its type
Gaius Mulley [Fri, 16 Aug 2024 11:41:11 +0000 (12:41 +0100)]
PR modula2/116378 m2 bootstrap fails on x86_64-darwin
This patch fixes m2 bootstrap failure on x86_64-darwin. libc_open
is defined with three parameters the last of which is an int for
portability (rather than a vararg). This avoids portability
problems by promoting mode_t to an int. In the future it could
be tidied up by using the m2 optarg extension.
gcc/m2/ChangeLog:
PR modula2/116378
* gm2-libs-iso/TermFile.mod (termOpen): Add third argument
for open.
* gm2-libs/libc.def (open): Remove vararg and use INTEGER for
mode parameter three.
* mc-boot-ch/Glibc.c (tracedb_open): Replace mode_t with int.
(libc_open): Rewrite without varargs.
* mc-boot/Glibc.h (libc_open): Replace varargs with int mode.
* pge-boot/Glibc.cc (libc_open): Rewrite.
* pge-boot/Glibc.h (libc_open): Replace varargs with int mode.
gcc/testsuite/ChangeLog:
PR modula2/116378
* gm2/extensions/run/pass/testopen.mod: Add third argument
for open.
* gm2/isolib/run/pass/openlibc.mod: Ditto.
* gm2/pim/run/pass/testaddr3.mod: Ditto.
Jakub Jelinek [Fri, 16 Aug 2024 09:43:18 +0000 (11:43 +0200)]
c++: Pedwarn on [[]]; at class scope [PR110345]
For C++ 26 P2552R3 I went through all the spots (except modules) where
attribute-specifier-seq appears in the grammar and tried to construct
a testcase in all those spots, for now for [[deprecated]] attribute.
The fourth issue is that we just emit (when enabled) -Wextra-semi warning
not just for lone semicolon at class scope (correct), but also for
[[]]; or [[whatever]]; there too.
While just semicolon is valid in C++11 and newer,
https://eel.is/c++draft/class.mem#nt:member-declaration
allows empty-declaration, unlike namespace scope or block scope
something like attribute-declaration or empty statement with attributes
applied for it aren't supported.
While syntactically it matches
attribute-specifier-seq [opt] decl-specifier-seq [opt] member-declarator-list [opt] ;
with the latter two omitted, there is
https://eel.is/c++draft/class.mem#general-3
which says that is not valid.
So, the following patch emits a pedwarn in that case.
2024-08-16 Jakub Jelinek <jakub@redhat.com>
PR c++/110345
* parser.cc (cp_parser_member_declaration): Call maybe_warn_extra_semi
only if it is empty-declaration, if there are some tokens like
attribute, pedwarn that the declaration doesn't declare anything.
Lingling Kong [Fri, 16 Aug 2024 07:52:27 +0000 (15:52 +0800)]
i386: Fix some vex insns that prohibit egpr
Although these vex insn have evex counterpart, but when it
uses the displayed vex prefix should not support APX EGPR.
Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT.
TARGET_AVXVNNIINT8 and TARGET_AVXVNNITINT16 also are vex
insn should not support egpr.
Andrew Pinski [Mon, 10 Jun 2024 00:39:54 +0000 (00:39 +0000)]
aarch64: Improve popcount for bytes [PR113042]
For popcount for bytes, we don't need the reduction addition
after the vector cnt instruction as we are only counting one
byte's popcount.
This changes the popcount extend to cover all ALLI rather than GPI.
Changes since v1:
* v2 - Use ALLI iterator and combine all into one pattern.
Add new testcases popcnt[6-8].c.
* v3 - Simplify TARGET_CSSC path.
Use convert_to_mode instead of gen_zero_extend* directly.
Some other small cleanups.
Bootstrapped and tested on aarch64-linux-gnu with no regressions.
PR target/113042
gcc/ChangeLog:
* config/aarch64/aarch64.md (popcount<mode>2): Update pattern
to support ALLI modes.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/popcnt5.c: New test.
* gcc.target/aarch64/popcnt6.c: New test.
* gcc.target/aarch64/popcnt7.c: New test.
* gcc.target/aarch64/popcnt8.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
libstdc++-v3: Handle iconv as optional for newlib builds [PR116362]
Support for iconv in newlib seems to have been always
assumed present by libstdc++-v3, but is default off.
Though, it hasn't been used before recent libstdc++ changes
that actually call iconv functions. This now leads to
failures exposed by running the test-suite, unless the
newlib being used has been explicitly configured with
--enable-newlib-iconv. When failing, there are undefined
references to iconv, iconv_open or iconv_close for multiple
tests.
Thankfully there's a macro in newlib.h that we can check to
detect presence of iconv support for the newlib build that's
used.
libstdc++-v3: testsuite: Prune uncapitalized "in function" linker warning
Newer newlib trigger warnings about certain functions not implemented
(_getentropy) when testing libstdc++-v3.
Since 2018 (circa binutils-2.31) the "in function" prefix isn't
capitalized for those "not implemented" warnings when generated from
the linker (a GNU ld feature used by newlib). Dejagnu up to and
including at least dejagnu-1.6.3 (and git @ 42979bd3b9) assumes a
capital "In function", leaving that part unpruned, and boom we have
thousands of "excess errors" from the libstdc++-v3 testsuite.
While gcc/testsuite/lib/prune.exp:prune_gcc_output already deals with
this quirk with a vastly more generic pattern, I choose this simpler
tweak.
libstdc++-v3:
* testsuite/lib/prune.exp (libstdc++-dg-prune): Prune
uncapitalized "in function" warning from linker.
Andrew Pinski [Mon, 6 Nov 2023 03:27:51 +0000 (19:27 -0800)]
PHIOPT: Fix comment before factor_out_conditional_operation
I didn't update the comment before factor_out_conditional_operation
correctly. this updates it to be correct and mentions unary operations
rather than just conversions.
Vineet Gupta [Thu, 15 Aug 2024 16:24:27 +0000 (09:24 -0700)]
RISC-V: use fclass insns to implement isfinite,isnormal and isinf builtins
Currently these builtins use float compare instructions which require
FP flags to be saved/restored which could be costly in uarch.
RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP
values w/o disturbing FP exception flags.
Now that upstream supports the corresponding optabs, wire them up in the
backend.
gcc/ChangeLog:
* config/riscv/riscv.md: define_insn for fclass insn.
define_expand for isfinite, isnormal, isinf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/fclass.c: New tests.
Roger Sayle [Thu, 15 Aug 2024 21:02:05 +0000 (22:02 +0100)]
i386: Improve split of *extendv2di2_highpart_stv_noavx512vl.
This patch follows up on the previous patch to fix PR target/116275 by
improving the code STV (ultimately) generates for highpart sign extensions
like (x<<8)>>8. The arithmetic right shift is able to take advantage of
the available common subexpressions from the preceding left shift.
Hence previously with -O2 -m32 -mavx -mno-avx512vl we'd generate:
This patch also implements Uros' suggestion that the pre-reload
splitter could introduced a new pseudo to hold the intermediate
to potentially help reload with register allocation, which applies
when not performing the above optimization, i.e. on TARGET_XOP.
2024-08-15 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): Split
to an improved implementation on !TARGET_XOP. On TARGET_XOP, use
a new pseudo for the intermediate to simplify register allocation.
gcc/testsuite/ChangeLog
* g++.target/i386/pr116275-2.C: New test case.
Jakub Jelinek [Thu, 15 Aug 2024 20:50:07 +0000 (22:50 +0200)]
fortran: Fix bootstrap in resolve.cc [PR116387]
The r15-2934 change broke bootstrap:
../../gcc/fortran/resolve.cc: In function ‘bool resolve_operator(gfc_expr*)’:
../../gcc/fortran/resolve.cc:4649:22: error: too many arguments for format [-Werror=format-extra-args]
4649 | gfc_error ("Inconsistent coranks for operator at %%L and %%L",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The following patch fixes that by using %L rather than %%L, the call has 2
location arguments.
2024-08-15 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/116387
* resolve.cc (resolve_operator): Use %L rather than %%L in format
string.
Tweak base/index disambiguation in decompose_normal_address [PR116236]
The PR points out that, for an address like:
(plus (zero_extend X) Y)
decompose_normal_address doesn't establish a strong preference
between treating X as the base or Y as the base. As the comment
in the patch says, zero_extend isn't enough on its own to assume
an index, at least not on POINTERS_EXTEND_UNSIGNED targets.
But in a construct like the one above, X and Y have different modes,
and it seems reasonable to assume that the one with the expected
address mode is the base.
This matters on targets like m68k that support index extension
and that require different classes for bases and indices.
gcc/
PR middle-end/116236
* rtlanal.cc (decompose_normal_address): Try to distinguish
bases and indices based on mode, before resorting to "baseness".
late-combine: Preserve INSN_CODE when modifying notes [PR116343]
When it removes a definition, late-combine tries to update all
uses in notes. It does this using the same insn_propagation class
that it uses for patterns.
However, insn_propagation uses validate_change, which in turn
resets the INSN_CODE. This is inefficient in the best case,
since it forces the pattern to be rerecognised even though
changing a note can't affect the INSN_CODE. But in the PR
it's a correctness problem: resetting INSN_CODE means we lose
the NOOP_INSN_MOVE_CODE, which in turn means that rtl-ssa doesn't
queue it for deletion.
This patch adds a routine specifically for propagating into notes.
A belt-and-braces fix would be to rerecognise noop moves in
function_info::change_insns, but I can't think of a good reason
why that would be necessary, and it could paper over latent bugs.
gcc/
PR testsuite/116343
* recog.h (insn_propagation::apply_to_note): Declare.
* recog.cc (insn_propagation::apply_to_note): New function.
* late-combine.cc (insn_combination::substitute_note): Use
apply_to_note instead of apply_to_rvalue.
* rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Improve
dumping of costs for noop moves.
gcc/testsuite/
PR testsuite/116343
* gcc.dg/torture/pr116343.c: New test.
Fix Coarray in associate not a coarray. [PR110033]
A coarray used in an associate did not become a coarray in the block of
the associate. This patch fixes that and the same also in select type
statements.
PR fortran/110033
gcc/fortran/ChangeLog:
* class.cc (gfc_is_class_scalar_expr): Coarray refs that ref
only self, aka this image, are regarded as scalar, too.
* resolve.cc (resolve_assoc_var): Ignore this image coarray refs
and do not build a new class type.
* trans-expr.cc (gfc_get_caf_token_offset): Get the caf token
from the descriptor for associated variables.
(gfc_conv_variable): Same.
(gfc_trans_pointer_assignment): Assign token to temporary
associate variable, too.
(gfc_trans_scalar_assign): Add flag that assign is for associate
and use it to assign the token.
(is_assoc_assign): Detect that expressions are for associate
assign.
(gfc_trans_assignment_1): Treat associate assigns like pointer
assignments where possible.
* trans-stmt.cc (trans_associate_var): Set same_class only for
class-targets.
* trans.h (gfc_trans_scalar_assign): Add flag to
trans_scalar_assign for marking associate assignments.
Compute the corank of an expression along side to the regular rank.
This safe costly calls to gfc_get_corank (), which consecutively has
been removed. In some locations the code needed some adaption to model
the difference between expr.corank and gfc_get_corank correctly. The
latter always returned the codimension of the expression and not its
current corank, i.e. the resolution of all indezes.
This commit is preparatory to fixing PR fortran/110033 and may contain
parts of that fix already.