Mayshao [Mon, 30 Oct 2023 21:19:12 +0000 (22:19 +0100)]
i386: Zhaoxin yongfeng enablement
Enable -march/-mtune=yongfeng. Costs and tunings are set according
to the characteristics of the processor. Add a new .md file to describe
yongfeng processor.
Martin Jambor [Mon, 30 Oct 2023 17:34:59 +0000 (18:34 +0100)]
ipa: Prune any IPA-CP aggregate constants known by modref to be killed (111157)
PR 111157 shows that IPA-modref and IPA-CP (when plugged into value
numbering) can optimize out a store both before a call (because the
call will overwrite it) and in the call (because the store is of the
same value) and by eliminating both create miscompilation.
This patch fixes that by pruning any constants from the list of IPA-CP
aggregate value constants that it knows the contents of the memory can
be "killed." Unfortunately, doing so is tricky. First, IPA-modref
loads override kills and so only stores not loaded are truly not
necessary. Looking stuff up there means doing what most of what
modref_may_alias may do but doing exactly what it does is tricky
because it takes also aliasing into account and has bail-out counters.
To err on the side of caution in order to avoid this miscompilation we
have to prune a constant when in doubt. However, pruning can
interfere with the mechanism of how clone materialization
distinguishes between the cases when a parameter was entirely removed
and when it was both IPA-CPed and IPA-SRAed (in order to make up for
the removal in debug info, which can bump into an assert when
compiling g++.dg/torture/pr103669.C when we are not careful).
Therefore this patch:
1) marks constants that IPA-modref has in its kill list with a new
"killed" flag, and
2) prunes the list from entries with this flag after materialization
and IPA-CP transformation is done using the template introduced in
the previous patch
It does not try to look up anything in the load lists, this will be
done as a follow-up in order to ease review.
gcc/ChangeLog:
2023-10-27 Martin Jambor <mjambor@suse.cz>
PR ipa/111157
* ipa-prop.h (struct ipa_argagg_value): Newf flag killed.
* ipa-modref.cc (ipcp_argagg_and_kill_overlap_p): New function.
(update_signature): Mark any any IPA-CP aggregate constants at
positions known to be killed as killed. Move check that there is
clone_info after this pruning.
* ipa-cp.cc (ipa_argagg_value_list::dump): Dump the killed flag.
(ipa_argagg_value_list::push_adjusted_values): Clear the new flag.
(push_agg_values_from_plats): Likewise.
(ipa_push_agg_values_from_jfunc): Likewise.
(estimate_local_effects): Likewise.
(push_agg_values_for_index_from_edge): Likewise.
* ipa-prop.cc (write_ipcp_transformation_info): Stream the killed
flag.
(read_ipcp_transformation_info): Likewise.
(ipcp_get_aggregate_const): Update comment, assert that encountered
record does not have killed flag set.
(ipcp_transform_function): Prune all aggregate constants with killed
set.
gcc/testsuite/ChangeLog:
2023-09-18 Martin Jambor <mjambor@suse.cz>
PR ipa/111157
* gcc.dg/lto/pr111157_0.c: New test.
* gcc.dg/lto/pr111157_1.c: Second file of the same new test.
Martin Jambor [Mon, 30 Oct 2023 17:34:59 +0000 (18:34 +0100)]
ipa-cp: Templatize filtering of m_agg_values
PR 111157 points to another place where IPA-CP collected aggregate
compile-time constants need to be filtered, in addition to the one
place that already does this in ipa-sra. In order to re-use code,
this patch turns the common bit into a template.
The functionality is still covered by testcase gcc.dg/ipa/pr108959.c.
gcc/ChangeLog:
2023-09-13 Martin Jambor <mjambor@suse.cz>
PR ipa/111157
* ipa-prop.h (ipcp_transformation): New member function template
remove_argaggs_if.
* ipa-sra.cc (zap_useless_ipcp_results): Use remove_argaggs_if to
filter aggreagate constants.
Patrick O'Neill [Mon, 30 Oct 2023 16:30:01 +0000 (09:30 -0700)]
RISC-V: Make rv32i_zcmp testcase more robust
GCC recently changed its register allocator which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any s register in the range of 1-9 for cm.push and cm.popret insns.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv32i_zcmp.c: Accept any register in the
range of 1-9 for cm.push and cm.popret insns.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Roger Sayle [Mon, 30 Oct 2023 16:21:28 +0000 (16:21 +0000)]
ARC: Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.
This patch optimizes PR middle-end/101955 for the ARC backend. On ARC
CPUs with a barrel shifter, using two shifts is optimal as:
asl_s r0,r0,31
asr_s r0,r0,31
but without a barrel shifter, GCC -O2 -mcpu=em currently generates:
and r2,r0,1
ror r2,r2
add.f 0,r2,r2
sbc r0,r0,r0
with this patch, we now generate the smaller, faster and non-flags
clobbering:
bmsk_s r0,r0,0
neg_s r0,r0
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR middle-end/101955
* config/arc/arc.md (*extvsi_1_0): New define_insn_and_split
to convert sign extract of the least significant bit into an
AND $1 then a NEG when !TARGET_BARREL_SHIFTER.
gcc/testsuite/ChangeLog
PR middle-end/101955
* gcc.target/arc/pr101955.c: New test case.
Roger Sayle [Mon, 30 Oct 2023 16:17:42 +0000 (16:17 +0000)]
ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.
This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc. The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.
The motivating example is derived from PR rtl-optimization/110717.
struct S { int a : 5; };
unsigned int foo (struct S *p) {
return p->a;
}
With a barrel shifter, GCC -O2 generates the reasonable:
Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops. This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.
Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
foo: ldb_s r0,[r0]
mov lp_count,27
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
mov lp_count,27
lp 2f
asr r0,r0
nop
2: # end single insn loop
j_s [blink]
which contains two loops and requires about ~113 cycles to execute.
With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
which requires only ~6 cycles, for the shorter shifts by 3 and sign
extension.
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
Provide reasonable values for SHIFTS and ROTATES by constant
bit counts depending upon TARGET_BARREL_SHIFTER.
(arc_insn_cost): Use insn attributes if the instruction is
recognized. Avoid calling get_attr_length for type "multi",
i.e. define_insn_and_split patterns without explicit type.
Fall-back to set_rtx_cost for single_set and pattern_cost
otherwise.
* config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
(BRANCH_COST): Improve/correct definition.
(LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
Roger Sayle [Mon, 30 Oct 2023 16:12:30 +0000 (16:12 +0000)]
ARC: Improved SImode shifts and rotates with -mswap.
This patch improves the code generated by the ARC back-end for CPUs
without a barrel shifter but with -mswap. The -mswap option provides
a SWAP instruction that implements SImode rotations by 16, but also
logical shift instructions (left and right) by 16 bits. Clearly these
are also useful building blocks for implementing shifts by 17, 18, etc.
which would otherwise require a loop.
As a representative example:
int shl20 (int x) { return x << 20; }
GCC with -O2 -mcpu=em -mswap would previously generate:
shl20: mov lp_count,10
lp 2f
add r0,r0,r0
add r0,r0,r0
2: # end single insn loop
j_s [blink]
Although both are four instructions (excluding the j_s),
the original takes ~22 cycles, and replacement ~4 cycles.
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_split_ashl): Use lsl16 on TARGET_SWAP.
(arc_split_ashr): Use swap and sign-extend on TARGET_SWAP.
(arc_split_lshr): Use lsr16 on TARGET_SWAP.
(arc_split_rotl): Use swap on TARGET_SWAP.
(arc_split_rotr): Likewise.
* config/arc/arc.md (ANY_ROTATE): New code iterator.
(<ANY_ROTATE>si2_cnt16): New define_insn for alternate form of
swap instruction on TARGET_SWAP.
(ashlsi2_cnt16): Rename from *ashlsi16_cnt16 and move earlier.
(lshrsi2_cnt16): New define_insn for LSR16 instruction.
(*ashlsi2_cnt16): See above.
gcc/testsuite/ChangeLog
* gcc.target/arc/lsl16-1.c: New test case.
* gcc.target/arc/lsr16-1.c: Likewise.
* gcc.target/arc/swap-1.c: Likewise.
* gcc.target/arc/swap-2.c: Likewise.
Richard Ball [Mon, 30 Oct 2023 15:31:26 +0000 (15:31 +0000)]
arm: move the switch tables for Arm to the RO data section.
Follow up patch to arm: Use deltas for Arm switch tables
This patch moves the switch tables for Arm from the .text section
into the .rodata section.
gcc/ChangeLog:
* config/arm/aout.h: Change to use the Lrtx label.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Remove arm targets
from (!target_pure_code) condition.
(ADDR_VEC_ALIGN): Add align for tables in rodata section.
* config/arm/arm.cc (arm_output_casesi): Alter the function to include
.Lrtx label and remove adr instructions.
* config/arm/arm.md
(arm_casesi_internal): Use force_reg to generate ldr instructions that
would otherwise be out of range, and change rtl to accommodate force reg.
Additionally remove unnecessary register temp.
(casesi): Remove pure code check for Arm.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Remove arm
targets from JUMP_TABLES_IN_TEXT_SECTION definition.
gcc/testsuite/ChangeLog:
* gcc.target/arm/arm-switchstatement.c: Alter the tests to
change adr instruction to ldr.
Jeevitha [Mon, 30 Oct 2023 09:07:07 +0000 (04:07 -0500)]
rs6000: Change bitwise xor to an equality operator [PR106907]
PR106907 has a few warnings spotted from cppcheck. These warnings
are related to the need of precedence clarification. Instead of using xor,
it has been changed to equality check, which achieves the same result.
Additionally, comment indentation has been fixed.
Juzhe-Zhong [Sat, 28 Oct 2023 02:05:07 +0000 (10:05 +0800)]
RISC-V: Fix bugs of handling scalar of SEW64 vx instruction in RV32
sew64_scalar_helper is handling SEW64 vx instruction pattern on RV32 system.
According to RVV ISA, we can directly use vx instruction of SEW64 on RV32 system
since RV32 GR reg is 32bit.
The root cause of this is because we missed VLMAX handling since the codes was invented
long time ago (Callers always intrinsics codes, no VLMAX situation).
Now, all following bugs are fixed after this patch:
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
FAIL: gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c execution test
Paul Thomas [Mon, 30 Oct 2023 07:12:40 +0000 (07:12 +0000)]
Fortran: Fix a problem with SELECT TYPE selectors [PR104555].
2023-10-30 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/104555
* resolve.cc (resolve_select_type): If the selector expression
has no class component references and the expression is a
derived type, copy the typespec of the symbol to that of the
expression.
gcc/testsuite/
PR fortran/104555
* gfortran.dg/pr104555.f90: New test.
Haochen Gui [Mon, 30 Oct 2023 02:59:51 +0000 (10:59 +0800)]
Expand: Checking available optabs for scalar modes in by pieces operations
The former patch (f08ca5903c7) examines the scalar modes by target
hook scalar_mode_supported_p. It causes some i386 regression cases
as XImode and OImode are not enabled in i386 target function. This
patch examines the scalar mode by checking if the corresponding optabs
are available for the mode.
gcc/
PR target/111449
* expr.cc (qi_vector_mode_supported_p): Rename to...
(by_pieces_mode_supported_p): ...this, and extends it to do
the checking for both scalar and vector mode.
(widest_fixed_size_mode_for_size): Call
by_pieces_mode_supported_p to examine the mode.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
Iain Buclaw [Sun, 29 Oct 2023 19:13:14 +0000 (20:13 +0100)]
d: Fix ICE: verify_gimple_failed (conversion of register to a different size in 'view_convert_expr')
Static arrays in D are passed around by value, rather than decaying to a
pointer. On x86_64 __builtin_va_list is an exception to this rule, but
semantically it's still treated as a static array.
This makes certain assignment operations fail due a mismatch in types.
As all examples in the test program are rejected by C/C++ front-ends,
these are now errors in D too to be consistent.
PR d/110712
gcc/d/ChangeLog:
* d-codegen.cc (d_build_call): Update call to convert_for_argument.
* d-convert.cc (is_valist_parameter_type): New function.
(check_valist_conversion): New function.
(convert_for_assignment): Update signature. Add check whether
assigning va_list is permissible.
(convert_for_argument): Likewise.
* d-tree.h (convert_for_assignment): Update signature.
(convert_for_argument): Likewise.
* expr.cc (ExprVisitor::visit (AssignExp *)): Update call to
convert_for_assignment.
Martin Uecker [Wed, 25 Oct 2023 21:24:34 +0000 (23:24 +0200)]
tree-optimization/109334: Improve computation for access attribute
The fix for PR104970 restricted size computations to the case
where the access attribute was specified explicitly (no VLA).
It also restricted it to void pointers or elements with constant
sizes. The second restriction is enough to fix the original bug.
Revert the first change to again allow size computations for VLA
parameters and for VLA parameters together with an explicit access
attribute.
Max Filippov [Thu, 7 Sep 2023 03:13:22 +0000 (20:13 -0700)]
gcc: xtensa: fix salt/saltu version check
gcc/
* config/xtensa/xtensa.h (TARGET_SALT): Change HW version from
260000 (which corresponds to RF-2014.0) to 270000 (which
corresponds to RG-2015.0, the release where salt/saltu opcodes
were introduced).
Pan Li [Sat, 28 Oct 2023 14:48:58 +0000 (22:48 +0800)]
RISC-V: Fix one range-loop-construct warning of avlprop
This patch would like to fix one warning of avlprop as below.
../../gcc/config/riscv/riscv-avlprop.cc: In member function 'virtual
unsigned int pass_avlprop::execute(function*)':
../../gcc/config/riscv/riscv-avlprop.cc:346:23: error: loop variable
'candidate' creates a copy from type 'const std::pair<avlprop_type,
rtl_ssa::insn_info*>' [-Werror=range-loop-construct]
346 | for (const auto candidate : m_candidates)
| ^~~~~~~~~
../../gcc/config/riscv/riscv-avlprop.cc:346:23: note: use reference type
to prevent copying
346 | for (const auto candidate : m_candidates)
| ^~~~~~~~~
| &
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Use
reference type to prevent copying.
Iain Buclaw [Sat, 28 Oct 2023 22:27:49 +0000 (00:27 +0200)]
d: Fix ICE: in verify_gimple_in_seq on powerpc-darwin9 [PR112270]
This ICE was seen during stage2 on powerpc-darwin9 only. There were
still some uses of GCC's boolean_type_node in the D front-end, which
caused a type mismatch to trigger as D bool size is fixed to 1 byte on
all targets.
So two new nodes have been introduced - d_bool_false_node and
d_bool_true_node - which have replaced all remaining uses of
boolean_false_node and boolean_true_node respectively.
PR d/112270
gcc/d/ChangeLog:
* d-builtins.cc (d_build_d_type_nodes): Initialize d_bool_false_node,
d_bool_true_node.
* d-codegen.cc (build_array_struct_comparison): Use d_bool_false_node
instead of boolean_false_node.
* d-convert.cc (d_truthvalue_conversion): Use d_bool_false_node and
d_bool_true_node instead of boolean_false_node and boolean_true_node.
* d-tree.h (enum d_tree_index): Add DTI_BOOL_FALSE and DTI_BOOL_TRUE.
(d_bool_false_node): New macro.
(d_bool_true_node): New macro.
* modules.cc (build_dso_cdtor_fn): Use d_bool_false_node and
d_bool_true_node instead of boolean_false_node and boolean_true_node.
(register_moduleinfo): Use d_bool_type instead of boolean_type_node.
Iain Buclaw [Sat, 28 Oct 2023 07:42:15 +0000 (09:42 +0200)]
d: Add warning for call expression without side effects
In the last merge of the dmd front-end with upstream (r14-4830), this
warning got removed from the semantic passes. Reimplement the warning
for the code generation pass instead, where it cannot have an effect on
conditional compilation.
gcc/d/ChangeLog:
* d-codegen.cc (call_side_effect_free_p): New function.
* d-tree.h (CALL_EXPR_WARN_IF_UNUSED): New macro.
(call_side_effect_free_p): New prototype.
* expr.cc (ExprVisitor::visit (CallExp *)): Set
CALL_EXPR_WARN_IF_UNUSED on matched call expressions.
(ExprVisitor::visit (NewExp *)): Don't dereference the result of an
allocation call here.
* toir.cc (add_stmt): Emit warning when call expression added to
statement list without being used.
[RA]: Fixing i686 bootstrap failure because of pushing the equivalence patch
GCC with my recent patch improving cost calculation for pseudos with
equivalence may generate different code with and without debug info
and as the result i686 bootstrap fails on i686. The patch fixes this
bug.
gcc/ChangeLog:
PR rtl-optimization/112107
* ira-costs.cc: (calculate_equiv_gains): Use NONDEBUG_INSN_P
instead of INSN_P.
Patrick O'Neill [Fri, 27 Oct 2023 17:50:28 +0000 (10:50 -0700)]
RISC-V: Make stack_save_restore_2 more robust
GCC recently changed to emit __riscv_restore_5 which causes this
testcase to fail.
This patch updates the regex to be more robust to change by accepting
any number after __riscv_save_ and __riscv_restore_.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack_save_restore_2.c: Accept any number
after __riscv_save_ and __riscv_restore_.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Gaius Mulley [Fri, 27 Oct 2023 17:42:09 +0000 (18:42 +0100)]
PR modula2/112110: fails to build on freebsd when compiling wrapclock.cc
This patch fixes a mangled #if #endif conditional section within
wrapclock.cc. The conditional section in wrapclock_timezone
should return 0 rather than return timezone.
libgm2/ChangeLog:
PR modula2/112110
* libm2iso/wrapclock.cc (timezone): Return 0 if unable
to get the timezone from the tm struct.
Harald Anlauf [Thu, 26 Oct 2023 20:32:35 +0000 (22:32 +0200)]
Fortran: diagnostics of MODULE PROCEDURE declaration conflicts [PR104649]
gcc/fortran/ChangeLog:
PR fortran/104649
* decl.cc (gfc_match_formal_arglist): Handle conflicting declarations
of a MODULE PROCEDURE when one of the declarations is an alternate
return.
gcc/testsuite/ChangeLog:
PR fortran/104649
* gfortran.dg/pr104649.f90: New test.
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
Andrew Stubbs [Fri, 27 Oct 2023 16:53:10 +0000 (17:53 +0100)]
amdgcn: Fix bug in gfx1030 support patch
The previous patch to add gfx1030 support introduced an issue with passing
exit codes from kernels run under gcn-run (offload kernels were unaffected).
An inline asm with multiple output operands is represented as a
parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS.
insn_propagation didn't account for this, and instead propagated
into each ASM_OPERANDS individually. This meant that it could
apply a substitution X->Y to Y itself, which (a) could create
circularity and (b) would be semantically wrong in any case,
since Y might use a different value of X.
This patch checks explicitly for parallels involving ASM_OPERANDS,
just like combine does.
Patrick Palka [Fri, 27 Oct 2023 15:31:02 +0000 (11:31 -0400)]
c++: another build_new_1 folding fix [PR111929]
In build_new_1, we also need to avoid folding 'outer_nelts_check' when
in a template context to prevent an ICE on the below testcase. This
patch replaces the problematic fold_build2 call with build2 (we'll later
fold it if appropriate during cp_fully_fold).
In passing, this patch removes an unnecessary conversion of 'nelts'
since it should always already be a size_t (and 'convert' isn't the best
conversion entry point to use anyway since it lacks a complain parameter).
PR c++/111929
gcc/cp/ChangeLog:
* init.cc (build_new_1): Remove unnecessary call to convert
on 'nelts'. Use build2 instead of fold_build2 for
'outer_nelts_checks'.
Patrick Palka [Fri, 27 Oct 2023 15:14:04 +0000 (11:14 -0400)]
c++: more ahead-of-time -Wparentheses warnings
Now that we don't have to worry about looking through NON_DEPENDENT_EXPR,
we can easily extend the -Wparentheses warning in convert_for_assignment
to consider (non-dependent) templated assignment operator expressions as
well, like r14-4111-g6e92a6a2a72d3b did in maybe_convert_cond.
gcc/cp/ChangeLog:
* cp-tree.h (maybe_warn_unparenthesized_assignment): Declare.
* semantics.cc (is_assignment_op_expr_p): Generalize to return
true for any assignment operator expression, not just one that
has been resolved to an operator overload.
(maybe_warn_unparenthesized_assignment): Factored out from ...
(maybe_convert_cond): ... here.
(finish_parenthesized_expr): Mention
maybe_warn_unparenthesized_assignment.
* typeck.cc (convert_for_assignment): Replace -Wparentheses
warning logic with maybe_warn_unparenthesized_assignment.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wparentheses-13.C: Strengthen by expecting that
we issue the -Wparentheses warnings ahead of time.
* g++.dg/warn/Wparentheses-23.C: Likewise.
* g++.dg/warn/Wparentheses-32.C: Remove xfails.
Gaius Mulley [Fri, 27 Oct 2023 14:54:48 +0000 (15:54 +0100)]
PR modula2/111530: Build failure on BSD due to getopt_long_only GNU extension dependency
This patch uses the libiberty getopt long functions (wrapped up inside
libgm2/libm2pim/cgetopt.cc) and only enables this implementation if
libgm2/configure.ac detects no getopt_long and friends on the target.
gcc/m2/ChangeLog:
PR modula2/111530
* gm2-libs-ch/cgetopt.c (cgetopt_cgetopt_long): Re-format.
(cgetopt_cgetopt_long_only): Re-format.
(cgetopt_SetOption): Re-format and assign flag to NULL
if name is also NULL.
* gm2-libs/GetOpt.def (AddLongOption): Add index parameter
and change flag to be a VAR parameter rather than a pointer.
(GetOptLong): Re-format.
(GetOpt): Correct comment.
* gm2-libs/GetOpt.mod: Re-write to rely on cgetopt rather
than implement long option creation in GetOpt.
* gm2-libs/cgetopt.def (SetOption): has_arg type is INTEGER.
libgm2/ChangeLog:
PR modula2/111530
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (AC_CHECK_HEADERS): Include getopt.h.
(GM2_CHECK_LIB): getopt_long check.
(GM2_CHECK_LIB): getopt_long_only check.
* libm2cor/Makefile.in: Regenerate.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.in: Regenerate.
* libm2min/Makefile.in: Regenerate.
* libm2pim/Makefile.in: Regenerate.
* libm2pim/cgetopt.cc: Re-write using conditional on configure
and long function code from libiberty/getopt.c.
gcc/testsuite/ChangeLog:
PR modula2/111530
* gm2/pimlib/run/pass/testgetopt.mod: New test.
Yangyu Chen [Fri, 27 Oct 2023 14:39:26 +0000 (08:39 -0600)]
[PATCH] RISC-V: Fix wrong tune parameters on int_div
This patch fixes an issue with the cost on "int_div" in various RISC-V
tune parameters including those for Rocket, SiFive U7 series, and T-Head
C906. This incorrect cost value interferes with the optimization process.
For example, it prevents the optimization of division by a constant to a
more efficient method known as Barrett reduction. This lack of
optimization negatively affects the performance of these systems.
The integer div cost of the Rocket and SiFive U7 is taken from the
Rocket-Chip Divider source code[1] with BigCore configuration[2]. It shows
the divUnroll unchanged which is 1 by default. Thus, the maximum int_div
cycles should be the dataWidth + 1, which is 33 for 32-bit and 65 for
64-bit.
As for C906, the divider takes 2 cycle to start[3], and it produce 2-bit
result each cycle[4]. Thus, the maximum int_div cycles should be the
dataWidth / 2 + 2, which is 18 for 32-bit and 34 for 64-bit.
I also test the performance on VisionFive2 which has Qual-Core Sifive U74.
I write a simple C program to do 1e8 times div by constant 6 in int32. The
result shows it takes 1.998s using div, and 0.420s using barrett reduction
to replace div with mul, which is 4.75x faster.
Robin Dapp [Thu, 26 Oct 2023 18:40:00 +0000 (20:40 +0200)]
RISC-V: Fix cond_sqrt tests.
As long as we do not have universal Zvfh support in binutils
linking against libm does not work out of the box. This patch
splits the cond_sqrt tests into non-zvfh and zvfh variants and
makes the run-zvfh ones depend on a zvfh target.
While at it, I also added Zvfh handling to the testsuite helpers.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Remove
Float16.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto.
* lib/target-supports.exp: Add zvfh handling.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-2.c: New test.
[RA]: Add cost calculation for reg equivalence invariants
My recent patch improving cost calculation for pseudos with equivalence
resulted in failure of gcc.target/arm/eliminate.c on aarch64. This patch
fixes this failure.
gcc/ChangeLog:
* ira-costs.cc: (get_equiv_regno, calculate_equiv_gains):
Process reg equivalence invariants.
aarch64: Add basic target_print_operand support for CONST_STRING
Motivated by the need to print system register names in output
assembly, this patch adds the required logic to
`aarch64_print_operand' to accept rtxs of type CONST_STRING and
process these accordingly.
Roger Sayle [Fri, 27 Oct 2023 09:03:53 +0000 (10:03 +0100)]
PR target/110551: Fix reg allocation for widening multiplications on x86.
This patch contains clean-ups of the widening multiplication patterns in
i386.md, and provides variants of the existing highpart multiplication
peephole2 transformations (that tidy up register allocation after
reload), and thereby fixes PR target/110551, which is a superfluous
move instruction.
For the new test case, compiled on x86_64 with -O2.
The clean-ups are (i) that operand 1 is consistently made register_operand
and operand 2 becomes nonimmediate_operand, so that predicates match the
constraints, (ii) the representation of the BMI2 mulx instruction is
updated to use the new umul_highpart RTX, and (iii) because operands
0 and 1 have different modes in widening multiplications, "a" is a more
appropriate constraint than "0" (which avoids spills/reloads containing
SUBREGs). The new peephole2 transformations are based upon those at
around line 9951 of i386.md, that begins with the comment
;; Highpart multiplication peephole2s to tweak register allocation.
;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx -> mov imm,%rax; imulq %rdi
2023-10-27 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110551
* config/i386/i386.md (<u>mul<mode><dwi>3): Make operands 1 and
2 take "regiser_operand" and "nonimmediate_operand" respectively.
(<u>mulqihi3): Likewise.
(*bmi2_umul<mode><dwi>3_1): Operand 2 needs to be register_operand
matching the %d constraint. Use umul_highpart RTX to represent
the highpart multiplication.
(*umul<mode><dwi>3_1): Operand 2 should use regiser_operand
predicate, and "a" rather than "0" as operands 0 and 2 have
different modes.
(define_split): For mul to mulx conversion, use the new
umul_highpart RTX representation.
(*mul<mode><dwi>3_1): Operand 1 should be register_operand
and the constraint %a as operands 0 and 1 have different modes.
(*<u>mulqihi3_1): Operand 1 should be register_operand matching
the constraint %0.
(define_peephole2): Providing widening multiplication variants
of the peephole2s that tweak highpart multiplication register
allocation.
gcc/testsuite/ChangeLog
PR target/110551
* gcc.target/i386/pr110551.c: New test case.
Lewis Hyatt [Fri, 27 Oct 2023 08:32:50 +0000 (04:32 -0400)]
preprocessor: c++: Support `#pragma GCC target' macros [PR87299]
`#pragma GCC target' is not currently handled in preprocess-only mode (e.g.,
when running gcc -E or gcc -save-temps). As noted in the PR, this means that
if the target pragma defines any macros, those macros are not effective in
preprocess-only mode. Similarly, such macros are not effective when
compiling with C++ (even when compiling without -save-temps), because C++
does not process the pragma until after all tokens have been obtained from
libcpp, at which point it is too late for macro expansion to take place.
Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas
under these conditions as well, so resolve the PR by using the new "early
pragma" support.
toplev.cc required some changes because the target-specific handlers for
`#pragma GCC target' may call target_reinit(), and toplev.cc was not expecting
that function to be called in preprocess-only mode.
I added some additional testcases from the PR for x86. The other targets
that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
already had tests verifying that the pragma sets macros as expected; here I
have added -save-temps versions of some of them, to test that they now work
in preprocess-only mode as well.
gcc/c-family/ChangeLog:
PR preprocessor/87299
* c-pragma.cc (init_pragma): Register `#pragma GCC target' and
related pragmas in preprocess-only mode, and enable early handling.
(c_reset_target_pragmas): New function refactoring code from...
(handle_pragma_reset_options): ...here.
* c-pragma.h (c_reset_target_pragmas): Declare.
gcc/cp/ChangeLog:
PR preprocessor/87299
* parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas ()
after preprocessing is complete, before starting compilation.
gcc/ChangeLog:
PR preprocessor/87299
* toplev.cc (no_backend): New static global.
(finalize): Remove argument no_backend, which is now a
static global.
(process_options): Likewise.
(do_compile): Likewise.
(target_reinit): Don't do anything in preprocess-only mode.
(toplev::main): Adapt to no_backend change.
(toplev::finalize): Likewise.
gcc/testsuite/ChangeLog:
PR preprocessor/87299
* c-c++-common/pragma-target-1.c: New test.
* c-c++-common/pragma-target-2.c: New test.
* g++.target/i386/pr87299-1.C: New test.
* g++.target/i386/pr87299-2.C: New test.
* gcc.target/i386/pr87299-1.c: New test.
* gcc.target/i386/pr87299-2.c: New test.
* gcc.target/s390/target-attribute/tattr-2b.c: New test.
* gcc.target/aarch64/pragma_cpp_predefs_1b.c: New test.
* gcc.target/arm/pragma_arch_attribute_1b.c: New test.
* gcc.target/nios2/custom-fp-2b.c: New test.
* gcc.target/powerpc/float128-3b.c: New test.
Paul Thomas [Fri, 27 Oct 2023 08:33:38 +0000 (09:33 +0100)]
Fortran: Fix some problems with SELECT TYPE selectors [PR104625].
2023-10-27 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/104625
* expr.cc (gfc_check_vardef_context): Check that the target
does have a vector index before emitting the specific error.
* match.cc (copy_ts_from_selector_to_associate): Ensure that
class valued operator expressions set the selector rank and
use the rank to provide the associate variable with an
appropriate array spec.
* resolve.cc (resolve_operator): Reduce stacked parentheses to
a single pair.
(fixup_array_ref): Extract selector symbol from parentheses.
* gcc.dg/tree-ssa/bitcmp-1.c: New test.
* gcc.dg/tree-ssa/bitcmp-2.c: New test.
* gcc.dg/tree-ssa/bitcmp-3.c: New test.
* gcc.dg/tree-ssa/bitcmp-4.c: New test.
* gcc.dg/tree-ssa/bitcmp-5.c: New test.
* gcc.dg/tree-ssa/bitcmp-6.c: New test.
Juzhe-Zhong [Thu, 26 Oct 2023 22:28:56 +0000 (06:28 +0800)]
RISC-V: Move lmul calculation into macro
Notice we calculate LMUL according to --param=riscv-autovec-lmul
in multiple places: int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul;
Juzhe-Zhong [Thu, 26 Oct 2023 08:13:51 +0000 (16:13 +0800)]
RISC-V: Add AVL propagation PASS for RVV auto-vectorization
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
Consider a simple vector addition operation:
https://godbolt.org/z/7hfGfEjW3
void
foo (int *__restrict a,
int *__restrict b,
int *__restrict n)
{
for (int i = 0; i < n; i++)
a[i] = a[i] + b[i];
}
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
The reasons as follows:
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS.
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
Here is an example to demonstrate more:
https://godbolt.org/z/bE86sv3q5
void foo2 (int *__restrict a,
int *__restrict b,
int *__restrict c,
int *__restrict a2,
int *__restrict b2,
int *__restrict c2,
int *__restrict a3,
int *__restrict b3,
int *__restrict c3,
int *__restrict a4,
int *__restrict b4,
int *__restrict c4,
int *__restrict a5,
int *__restrict b5,
int *__restrict c5,
int n)
{
for (int i = 0; i < n; i++){
a[i] = b[i] + c[i];
b5[i] = b[i] + c[i];
a2[i] = b2[i] + c2[i];
a3[i] = b3[i] + c3[i];
a4[i] = b4[i] + c4[i];
a5[i] = a[i] + a4[i];
a[i] = a5[i] + b5[i]+ a[i];
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
2. Epilogue:
Before this patch: After this patch:
.L5: .L5:
ld s0,8(sp) ret
addi sp,sp,16
jr ra
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
which would indicate to humans and compilers that argument 1 of "test_a"
is expected to be a null-terminated string, with the idea:
- we should complain if it's not valid to read from *p up to the first
'\0' character in the buffer
- we should complain if *p is not terminated, or if it's uninitialized
before the first '\0' character
This is independent of the nonnull-ness of the pointer: if you also want
to express that the argument must be non-null, we already have
__attribute__((nonnull (N))), so the user can write e.g.:
The patch implements:
(a) C/C++ frontends: recognition of this attribute
(b) analyzer: usage of this attribute
gcc/analyzer/ChangeLog:
* region-model.cc
(region_model::check_external_function_for_access_attr): Split
out, replacing with...
(region_model::check_function_attr_access): ...this new function
and...
(region_model::check_function_attrs): ...this new function.
(region_model::check_one_function_attr_null_terminated_string_arg):
New.
(region_model::check_function_attr_null_terminated_string_arg):
New.
(region_model::handle_unrecognized_call): Update for renaming of
check_external_function_for_access_attr to check_function_attrs.
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload. Make both overloads const.
* region-model.h: Include "stringpool.h" and "attribs.h".
(region_model::check_for_null_terminated_string_arg): Add return
value to one overload. Make both overloads const.
(region_model::check_external_function_for_access_attr): Delete
decl.
(region_model::check_function_attr_access): New decl.
(region_model::check_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_one_function_attr_null_terminated_string_arg):
New decl.
(region_model::check_function_attrs): New decl.
gcc/ChangeLog:
* doc/extend.texi (Common Function Attributes): Add
null_terminated_string_arg.
gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/attr-null_terminated_string_arg-access-read_write.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-access-without-size.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-multiple.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-2.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable-sized.c:
New test.
* c-c++-common/analyzer/attr-null_terminated_string_arg-nullable.c:
New test.
* c-c++-common/attr-null_terminated_string_arg.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Iain Sandoe [Thu, 26 Oct 2023 18:46:16 +0000 (19:46 +0100)]
testsuite, aarch64: Normalise options to aarch64.exp.
When the compiler is configured --with-cpu= and that is different from
the baselines assumed, we see excess tes fails (primarly in body code
scans which are necessarily sensitive to costs). To stabilize the
testsuite against such changes, use aarch64-with-arch-dg-options ()
to provide suitable consistent defaults.
e.g. for --with-cpu=xgene1 we see over 100 excess fails which are
removed by this change.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/aarch64.exp: Use aarch64-with-arch-dg-options
to normaize the options to the tests in aarch64.exp.
This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified
to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead
of abs.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
This adds an effective target DejaGnu directive to prevent these testcases from
failing on GCC configurations that do not support OpenMP.
This fixes 8d2130a4e5c.
Aldy Hernandez [Sun, 1 Oct 2023 20:54:27 +0000 (16:54 -0400)]
[range-ops] Add frange& argument to rv_fold.
The floating point version of rv_fold returns its result in 3 pieces:
the lower bound, the upper bound, and a maybe_nan bit. It is cleaner
to return everything in an frange, thus bringing the floating point
version of rv_fold in line with the integer version.
This first patch adds an frange argument, while keeping the current
functionality, and asserting that we get the same results. In a
follow-up patch I will nuke the now useless 3 arguments. Splitting
this into two patches makes it easier to bisect any problems if any
should arise.
Patrick O'Neill [Thu, 26 Oct 2023 00:03:24 +0000 (17:03 -0700)]
RISC-V: Pass abi to g++ rvv testsuite
On rv32gcv testcases like g++.target/riscv/rvv/base/bug-22.C fail with:
FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors)
Excess errors:
cc1plus: error: ABI requires '-march=rv32'
This patch adds the -mabi argument to g++ rvv tests.
gcc/testsuite/ChangeLog:
* g++.target/riscv/rvv/rvv.exp: Add -mabi argument to CFLAGS.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Thomas Schwinge [Mon, 11 Sep 2023 09:36:31 +0000 (11:36 +0200)]
libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]
Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d
"libatomic/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC'.
PR testsuite/109951
libatomic/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
* Makefile.in: Regenerate.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libatomic.exp (libatomic_init): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
* testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't
set.
(SYSROOT_CFLAGS_FOR_TARGET): Set.
Thomas Schwinge [Mon, 11 Sep 2023 08:50:00 +0000 (10:50 +0200)]
libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951]
Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7
"libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]",
this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b
"libffi/test: Fix compilation for build sysroot" done differently,
avoiding build-tree testing use of any random gunk that may appear in
build-time 'CC', 'CXX'.
PR testsuite/109951
libffi/
* configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'.
<local.exp>: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead
set 'SYSROOT_CFLAGS_FOR_TARGET'.
* Makefile.in: Regenerate.
* configure: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/lib/libffi.exp (libffi_target_compile): If
'--with-build-sysroot=[...]' was specified, use it for build-tree
testing.
testsuite: Allow general skips/requires in PCH tests
dg-pch.exp handled dg-require-effective-target pch_supported_debug
as a special case, by grepping the source code. This patch tries
to generalise it to other dg-require-effective-targets, and to
dg-skip-if.
There also seemed to be some errors in check-flags. It used:
lappend $args [list <elt>]
which treats the contents of args as a variable name. I think
it was supposed to be "lappend args" instead. From the later
code, the element was supposed to be <elt> itself, rather than
a singleton list containing <elt>.
We can also save some time by doing the common early-exit first.
Doing this removes the need to specify the dg-require-effective-target
in both files. Tested by faking unsupported debug and checking that
the tests were still correctly skipped.
gcc/testsuite/
* lib/target-supports-dg.exp (check-flags): Move default argument
handling further up. Fix a couple of issues in the lappends.
Avoid frobbing the compiler flags if the return value is already
known to be 1.
* lib/dg-pch.exp (dg-flags-pch): Process the dg-skip-if and
dg-require-effective-target directives to see whether the
assembly test should be skipped.
* gcc.dg/pch/valid-1.c: Remove dg-require-effective-target.
* gcc.dg/pch/valid-1b.c: Likewise.
Richard Ball [Thu, 26 Oct 2023 15:18:50 +0000 (16:18 +0100)]
arm: Use deltas for Arm switch tables
For normal optimization for the Arm state in gcc we get an uncompressed
table of jump targets. This is in the middle of the text segment
far larger than necessary, especially at -Os.
This patch compresses the table to use deltas in a similar manner to
Thumb code generation.
Similar code is also used for -fpic where we currently generate a jump
to a jump. In this format the jumps are too dense for the hardware branch
predictor to handle accurately, so execution is likely to be very expensive.
Changes to switch statements for arm include a new function to handle the
assembly generation for different machine modes. This allows for more
optimisation to be performed in aout.h where arm has switched from using
ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT.
In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been
added to utilise the different machine modes. Additional changes
made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE,
CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all
to accomodate this new approach to switch statement generation.
New tests have been added and no regressions on arm-none-eabi.
gcc/ChangeLog:
* config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output
for different machine modes for arm.
* config/arm/arm-protos.h (arm_output_casesi): New prototype.
* config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use
ASM_OUTPUT_ADDR_DIFF_ELT.
(CASE_VECTOR_SHORTEN_MODE): Change table size calculation for
TARGET_ARM.
(LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2
for TARGET_ARM.
* config/arm/arm.cc (arm_output_casesi): New function.
* config/arm/arm.md (arm_casesi_internal): Change casesi expand
and insn.
for arm to use new function arm_output_casesi.
Darwin: Make metadata symbol lables linker-visible for GNU objc.
Now we have shifted to using the same relocation mechanism as clang for
objective-c typeinfo the static linker needs to have a linker-visible
symbol for metadata names (this is only needed for GNU objective C, for
NeXT the names are in separate sections).
gcc/ChangeLog:
* config/darwin.h
(darwin_label_is_anonymous_local_objc_name): Make metadata names
linker-visibile for GNU objective C.
[RA]: Modfify cost calculation for dealing with equivalences
RISCV target developers reported that pseudos with equivalence used in
a loop can be spilled. Simple changes of heuristics of cost
calculation of pseudos with equivalence or even ignoring equivalences
resulted in numerous testsuite failures on different targets or worse
spec2017 performance. This patch implements more sophisticated cost
calculations of pseudos with equivalences. The patch does not change
RA behaviour for targets still using the old reload pass instead of
LRA. The patch solves the reported problem and improves x86-64
specint2017 a bit (specfp2017 performance stays the same). The patch
takes into account how the equivalence will be used: will it be
integrated into the user insns or require an input reload insn. It
requires additional pass over insns. To compensate RA slow down, the
patch removes a pass over insns in the reload pass used by IRA before.
This also decouples IRA from reload more and will help to remove the
reload pass in the future if it ever happens.
gcc/ChangeLog:
* dwarf2out.cc (reg_loc_descriptor): Use lra_eliminate_regs when
LRA is used.
* ira-costs.cc: Include regset.h.
(equiv_can_be_consumed_p, get_equiv_regno, calculate_equiv_gains):
New functions.
(find_costs_and_classes): Call calculate_equiv_gains and redefine
mem_cost of pseudos with equivs when LRA is used.
* var-tracking.cc: Include ira.h and lra.h.
(vt_initialize): Use lra_eliminate_regs when LRA is used.
Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)
In the context of an OpenMP declare variant directive, arguments of type C_PTR
are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the
variant - or the other way around, depending on the parsing order.
This patch prevents such situation from turning into a compile error.
* interface.cc (gfc_compare_types): Return true if one type is C_PTR
and the other is a compatible INTEGER(8).
* misc.cc (gfc_typename): Handle the case where an INTEGER(8) actually
holds a TYPE(C_PTR).
gcc/testsuite/ChangeLog:
* gfortran.dg/c_ptr_tests_20.f90: New test, checking that INTEGER(8)
and TYPE(C_PTR) are recognised as compatible.
* gfortran.dg/c_ptr_tests_21.f90: New test, exercising the error
detection for C_FUNPTR.
Roger Sayle [Thu, 26 Oct 2023 09:06:59 +0000 (10:06 +0100)]
PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.
For the new test case:
const int table[2] = {1, 2};
int foo (char i) { return table[i]; }
compiling with -O2 -mlarge on msp430 we currently see:
Jiahao Xu [Thu, 26 Oct 2023 01:34:32 +0000 (09:34 +0800)]
LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.
If the vcond_mask patterns don't support fp modes, the vector
FP comparison instructions will not be generated.
gcc/ChangeLog:
* config/loongarch/lasx.md (vcond_mask_<ILASX:mode><ILASX:mode>): Change to
(vcond_mask_<mode><mode256_i>): this.
* config/loongarch/lsx.md (vcond_mask_<ILSX:mode><ILSX:mode>): Change to
(vcond_mask_<mode><mode_i>): this.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: New test.
testsuite: Fix _BitInt in gcc.misc-tests/godump-1.c
Currently _BitInt is only supported on x86_64 which means that for other
targets all tests fail with e.g.
gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not supported on this target
237 | _BitInt(32) b32_v;
| ^~~~~~~
Instead of requiring _BitInt support for godump-1.c, move _BitInt tests
into godump-2.c such that all other tests in godump-1.c are still
executed in case of missing _BitInt support.
gcc/testsuite/ChangeLog:
* gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c.
* gcc.misc-tests/godump-2.c: New test.
Thomas Schwinge [Thu, 7 Sep 2023 20:15:08 +0000 (22:15 +0200)]
More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.
Per commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6 (Subversion r251048)
"Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or
may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent
of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but
instead on dynamic (run-time) configuration. There did remain a few instances
where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows
'TARGET_SUPPORTS_ALIASES'. Change these to 'if (TARGET_SUPPORTS_ALIASES)',
similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
Iain Sandoe [Wed, 25 Oct 2023 14:28:52 +0000 (15:28 +0100)]
Darwin: Handle the fPIE option specially.
For Darwin, PIE requires PIC codegen, but otherwise is only a link-time
change. For almost all Darwin, we do not report __PIE__; the exception is
32bit X86 and from Darwin12 to 17 only (32 bit is no longer supported
after Darwin17).
Iain Sandoe [Tue, 17 Oct 2023 10:58:52 +0000 (11:58 +0100)]
config, aarch64: Use a more compatible sed invocation.
Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension (automatically recognising extended regex).
This is failing on Darwin, which defualts to Posix behaviour.
However '-E' is accepted to indicate an extended RE. Strictly, this
is also not really sufficient, since we should only require a Posix
sed.
gcc/ChangeLog:
* config.gcc: Use -E to to sed to indicate that we are using
extended REs.
Wilco Dijkstra [Fri, 13 Oct 2023 17:22:06 +0000 (18:22 +0100)]
AArch64: Improve immediate generation
Further improve immediate generation by adding support for 2-instruction
MOV/EOR bitmask immediates. This reduces the number of 3/4-instruction
immediates in SPECCPU2017 by ~2%.
Reviewed-by: Richard Earnshaw <Richard.Earnshaw@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_internal_mov_immediate)
Add support for immediates using MOV/EOR bitmask.
Jason Merrill [Fri, 20 Oct 2023 16:22:44 +0000 (12:22 -0400)]
c++: improve comment
It's incorrect to say that the address of an OFFSET_REF is always a
pointer-to-member; if it represents an overload set with both static and
non-static member functions that ends up resolving to a static one, the
address is a normal pointer. And let's go ahead and mention explicit object
member functions even though the patch hasn't landed yet.
Uros Bizjak [Wed, 25 Oct 2023 14:26:57 +0000 (16:26 +0200)]
i386: Narrow test instructions with immediate operands [PR111698]
Narrow test instructions with immediate operand that test memory location
for zero. E.g. testl $0x00aa0000, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after a large write to the same address causes store-to-load forwarding stall.
PR target/111698
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL):
New tune.
* config/i386/i386.h (TARGET_PARTIAL_MEMORY_READ_STALL): New macro.
* config/i386/i386.md: New peephole pattern to narrow test
instructions with immediate operands that test memory locations
for zero.
chenxiaolong [Tue, 24 Oct 2023 06:40:14 +0000 (14:40 +0800)]
LoongArch: Implement __builtin_thread_pointer for TLS.
gcc/ChangeLog:
* config/loongarch/loongarch.md (get_thread_pointer<mode>):Adds the
instruction template corresponding to the __builtin_thread_pointer
function.
* doc/extend.texi:Add the __builtin_thread_pointer function support
description to the documentation.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/builtin_thread_pointer.c: New test.
Patrick Palka [Wed, 25 Oct 2023 13:03:52 +0000 (09:03 -0400)]
c++: add fixed testcase [PR99804]
We accept the non-dependent call f(e) here ever since the
NON_DEPENDENT_EXPR removal patch r14-4793-gdad311874ac3b3.
I haven't looked closely into why but I suspect wrapping 'e'
in a NON_DEPENDENT_EXPR was causing the argument conversion
to misbehave.
Jonathan Wakely [Tue, 24 Oct 2023 15:56:30 +0000 (16:56 +0100)]
libstdc++: Build libstdc++_libbacktrace.a as PIC [PR111936]
In order for std::stacktrace to be used in a shared library, the
libbacktrace symbols need to be built with -fPIC. Add the libtool
-prefer-pic flag to the commands in src/libbacktrace/Makefile so that
the archive contains PIC objects.