Yang Yujie [Wed, 6 Mar 2024 01:19:59 +0000 (09:19 +0800)]
LoongArch: Use /lib instead of /lib64 as the library search path for MUSL.
gcc/ChangeLog:
* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.
Jakub Jelinek [Thu, 7 Mar 2024 07:43:16 +0000 (08:43 +0100)]
match.pd: Optimize a * !a to 0 [PR114009]
The following patch attempts to fix an optimization regression through
adding a simple simplification. We already have the
/* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0 */
(if (!canonicalize_math_p ())
(for cmp (tcc_comparison)
(simplify
(mult:c (convert (cmp@0 @1 @2)) @3)
(if (INTEGRAL_TYPE_P (type)
&& INTEGRAL_TYPE_P (TREE_TYPE (@0)))
(cond @0 @3 { build_zero_cst (type); })))
optimization which otherwise triggers during the a * !a multiplication,
but that is done only late and we aren't able through range assumptions
optimize it yet anyway.
The patch adds a specific simplification for it.
If a is zero, then a * !a will be 0 * 1 (or for signed 1-bit 0 * -1)
and so 0.
If a is non-zero, then a * !a will be a * 0 and so again 0.
THe pattern is valid for scalar integers, complex integers and vector types,
but I think will actually trigger only for the scalar integers. For
vector types I've added other two with VEC_COND_EXPR in it, for complex
there are different GENERIC trees to match and it is something that likely
would be never matched in GIMPLE, so I didn't handle that.
2024-03-07 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114009
* genmatch.cc (decision_tree::gen): Emit ARG_UNUSED for captures
argument even for GENERIC, not just for GIMPLE.
* match.pd (a * !a -> 0): New simplifications.
Jerry DeLisle [Thu, 7 Mar 2024 03:46:04 +0000 (19:46 -0800)]
Fortran: Fix issue with using snprintf function.
The previous patch used snprintf to set the message
string. The message string is not a formatted string
and the snprintf will interpret '%' related characters
as format specifiers when there are no associated
output variables. A segfault ensues.
This change replaces snprintf with a fortran string copy
function and null terminates the message string.
PR libfortran/105456
libgfortran/ChangeLog:
* io/list_read.c (list_formatted_read_scalar): Use fstrcpy
from libgfortran/runtime/string.c to replace snprintf.
(nml_read_obj): Likewise.
* io/transfer.c (unformatted_read): Likewise.
(unformatted_write): Likewise.
(formatted_transfer_scalar_read): Likewise.
(formatted_transfer_scalar_write): Likewise.
* io/write.c (list_formatted_write_scalar): Likewise.
(nml_write_obj): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr105456.f90: Revise using '%' characters
in users error message.
Uros Bizjak [Wed, 6 Mar 2024 19:53:50 +0000 (20:53 +0100)]
i386: Fix and improve insn constraint for V2QI arithmetic/shift insns
optimize_function_for_size_p predicate is not stable during optab selection,
because it also depends on node->count/node->frequency of the current function,
which are updated during IPA, so they may change between early opts and
late opts. Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.
PR target/114232
gcc/ChangeLog:
* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p. Explictily enable for TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<plusminus:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p. Explictily enable for TARGET_SSE2.
(<plusminus:insn>v2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(<any_shift:insn>v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.
Robin Dapp [Wed, 6 Mar 2024 11:15:40 +0000 (12:15 +0100)]
RISC-V: Use vmv1r.v instead of vmv.v.v for fma output reloads [PR114200].
Three-operand instructions like vmacc are modeled with an implicit
output reload when the output does not match one of the operands. For
this we use vmv.v.v which is subject to length masking.
In a situation where the current vl is less than the full vlenb
and the fma's result value is used as input for a vector reduction
(which is never length masked) we effectively only reduce vl
elements. The masked-out elements are relevant for the
reduction, though, leading to a wrong result.
This patch replaces the vmv reloads by full-register reloads.
gcc/ChangeLog:
PR target/114200
PR target/114202
* config/riscv/vector.md: Use vmv[1248]r.v instead of vmv.v.v.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr114200.c: New test.
* gcc.target/riscv/rvv/autovec/pr114202.c: New test.
Robin Dapp [Mon, 15 Jan 2024 16:34:58 +0000 (17:34 +0100)]
RISC-V: Adjust vec unit-stride load/store costs.
Scalar loads provide offset addressing while unit-stride vector
instructions cannot. The offset must be loaded into a general-purpose
register before it can be used. In order to account for this, this
patch adds an address arithmetic heuristic that keeps track of data
reference operands. If we haven't seen the operand before we add the
cost of a scalar statement.
This helps to get rid of an lbm regression when vectorizing (roughly
0.5% fewer dynamic instructions). gcc5 improves by 0.2% and deepsjeng
by 0.25%. wrf and nab degrade by 0.1%. This is because before we now
adjust the cost of SLP as well as loop-vectorized instructions whereas
we would only adjust loop-vectorized instructions before.
Considering higher scalar_to_vec costs (3 vs 1) for all vectorization
types causes some snippets not to get vectorized anymore. Given these
costs the decision looks correct but appears worse when just counting
dynamic instructions.
In total SPECint 2017 has 4 bln dynamic instructions less and SPECfp 0.7
bln.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (adjust_stmt_cost): Move...
(costs::adjust_stmt_cost): ... to here and add vec_load/vec_store
offset handling.
(costs::add_stmt_cost): Also adjust cost for statements without
stmt_info.
* config/riscv/riscv-vector-costs.h: Define zero constant.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vse-slp-2.c: New test.
Wilco Dijkstra [Wed, 6 Mar 2024 17:35:16 +0000 (17:35 +0000)]
ARM: Fix conditional execution [PR113915]
By default most patterns can be conditionalized on Arm targets. However
Thumb-2 predication requires the "predicable" attribute be explicitly
set to "yes". Most patterns are shared between Arm and Thumb(-2) and are
marked with "predicable". Given this sharing, it does not make sense to
use a different default for Arm. So only consider conditional execution
of instructions that have the predicable attribute set to yes. This ensures
that patterns not explicitly marked as such are never conditionally executed.
Jeff Law [Wed, 6 Mar 2024 16:50:44 +0000 (09:50 -0700)]
[PR target/113001] Fix incorrect operand swapping in conditional move
This bug totally fell off my radar. Sorry about that.
We have some special casing the conditional move expander to simplify a
conditional move when comparing a register against zero and that same register
is one of the arms.
Specifically a (eq (reg) (const_int 0)) where reg is also the true arm or (ne
(reg) (const_int 0)) where reg is the false arm need not use the fully
generalized conditional move, thus saving an instruction for those cases.
In the NE case we swapped the operands, but didn't swap the condition, which
led to the ICE due to an unrecognized pattern. THe backend actually has
distinct patterns for those two cases. So swapping the operands is neither
needed nor advisable.
Regression tested on rv64gc and verified the new tests pass.
Pushing to the trunk.
PR target/113001
PR target/112871
gcc/
* config/riscv/riscv.cc (expand_conditional_move): Do not swap
operands when the comparison operand is the same as the false
arm for a NE test.
gcc/testsuite
* gcc.target/riscv/zicond-ice-3.c: New test.
* gcc.target/riscv/zicond-ice-4.c: New test.
Harald Anlauf [Tue, 5 Mar 2024 20:54:26 +0000 (21:54 +0100)]
Fortran: error recovery while simplifying expressions [PR103707,PR106987]
When an exception is encountered during simplification of arithmetic
expressions, the result may depend on whether range-checking is active
(-frange-check) or not. However, the code path in the front-end should
stay the same for "soft" errors for which the exception is triggered by the
check, while "hard" errors should always terminate the simplification, so
that error recovery is independent of the flag. Separation of arithmetic
error codes into "hard" and "soft" errors shall be done consistently via
is_hard_arith_error().
PR fortran/103707
PR fortran/106987
gcc/fortran/ChangeLog:
* arith.cc (is_hard_arith_error): New helper function to determine
whether an arithmetic error is "hard" or not.
(check_result): Use it.
(gfc_arith_divide): Set "Division by zero" only for regular
numerators of real and complex divisions.
(reduce_unary): Use is_hard_arith_error to determine whether a hard
or (recoverable) soft error was encountered. Terminate immediately
on hard error, otherwise remember code of first soft error.
(reduce_binary_ac): Likewise.
(reduce_binary_ca): Likewise.
(reduce_binary_aa): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr99350.f90:
* gfortran.dg/arithmetic_overflow_3.f90: New test.
Marek Polacek [Tue, 5 Mar 2024 18:33:10 +0000 (13:33 -0500)]
c++: ICE with noexcept and local specialization [PR114114]
Here we ICE because we call register_local_specialization while
local_specializations is null, so
local_specializations->put ();
crashes on null this. It's null since maybe_instantiate_noexcept calls
push_to_top_level which creates a new scope. Normally, I would have
guessed that we need a new local_specialization_stack. But here we're
dealing with an operand of a noexcept, which is an unevaluated operand,
and those aren't registered in the hash map. maybe_instantiate_noexcept
wasn't signalling that it's substituting an unevaluated operand though.
PR c++/114114
gcc/cp/ChangeLog:
* pt.cc (maybe_instantiate_noexcept): Save/restore
cp_unevaluated_operand, c_inhibit_evaluation_warnings, and
cp_noexcept_operand around the tsubst_expr call.
The following reworks vectorizable_live_operation to pass the
live stmt to vect_create_epilog_for_reduction also for early breaks
and a peeled main exit. This is to be able to figure the scalar
definition to replace. This reverts the PR114192 fix as it is
subsumed by this cleanup.
PR tree-optimization/114239
* tree-vect-loop.cc (vect_get_vect_def): Remove.
(vect_create_epilog_for_reduction): The passed in stmt_info
should now be the live stmt that produces the scalar reduction
result. Revert PR114192 fix. Base reduction info off
info_for_reduction. Remove special handling of
early-break/peeled, restore original vector def gathering.
Make sure to pick the correct exit PHIs.
(vectorizable_live_operation): Pass in the proper stmt_info
for early break exits.
* gcc.dg/vect/vect-early-break_122-pr114239.c: New testcase.
Xi Ruoyao [Tue, 5 Mar 2024 12:46:57 +0000 (20:46 +0800)]
LoongArch: testsuite: Rewrite {x,}vfcmp-{d,f}.c to avoid named registers
Loops on named vector register are not vectorized (see comment 11 of
PR113622), so the these test cases have been failing for a while.
Rewrite them using check-function-bodies to remove hard coding register
names. A barrier is needed to always load the first operand before the
second operand.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vfcmp-f.c: Rewrite to avoid named
registers.
* gcc.target/loongarch/vfcmp-d.c: Likewise.
* gcc.target/loongarch/xvfcmp-f.c: Likewise.
* gcc.target/loongarch/xvfcmp-d.c: Likewise.
While reworking the aarch64 feature descriptions, I forgot
to add out-of-class definitions of some static constants.
This could lead to a build failure with some compilers.
This was seen with some WIP to increase the number of extensions
beyond 64. It's latent on trunk though, and a regression from
before the rework.
gcc/
* config/aarch64/aarch64-feature-deps.h (feature_deps::info): Add
out-of-class definitions of static constants.
Nathaniel Shead [Tue, 5 Mar 2024 13:43:22 +0000 (00:43 +1100)]
c++: Fix template deduction for conversion operators with xobj parameters [PR113629]
Unification for conversion operators (DEDUCE_CONV) doesn't perform
transformations like handling forwarding references. This is correct in
general, but not for xobj parameters, which should be handled "normally"
for the purposes of deduction: [temp.deduct.conv] only applies to the
return type of the conversion function.
PR c++/113629
gcc/cp/ChangeLog:
* pt.cc (type_unification_real): Only use DEDUCE_CONV for the
return type of a conversion function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/explicit-obj-conv-op.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Richard Biener [Wed, 6 Mar 2024 08:25:15 +0000 (09:25 +0100)]
tree-optimization/114249 - ICE with BB reduction vectorization
When we scrap the last def of an odd lane numbered BB reduction
we can end up recording a pattern def which will later wreck
code generation. The following puts this logic where it better
belongs, avoiding this issue.
PR tree-optimization/114249
* tree-vect-slp.cc (vect_build_slp_instance): Move making
a BB reduction lane number even ...
(vect_slp_check_for_roots): ... here to avoid leaking
pattern defs.
Jakub Jelinek [Wed, 6 Mar 2024 08:35:37 +0000 (09:35 +0100)]
i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]
When writing the rest_of_handle_insert_vzeroupper workaround to manually
remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that
there is a df_analyze () call right after it and that the problems added
earlier in the pass, like df_note_add_problem () done during mode switching,
doesn't affect just the next df_analyze () call right after it, but all
other df_analyze () calls until the end of the current pass where
df_finish_pass removes the optional problems.
So, as can be seen on the following patch, the workaround doesn't actually
work there, because while rest_of_handle_insert_vzeroupper carefully removes
all REG_DEAD/REG_UNUSED notes, the df_analyze () call at the end of the
function immediately adds them in again (so, I must say I have no idea
why the workaround worked on the earlier testcases).
Now, I could move the df_analyze () call just before the REG_DEAD/REG_UNUSED
note removal loop, but I think the following patch is better, because
the df_analyze () call doesn't have to recompute the problem when we don't
care about it and will actively strip all traces of it away.
2024-03-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114190
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Call df_remove_problem for df_note before calling df_analyze.
Jerry DeLisle [Wed, 6 Mar 2024 04:49:23 +0000 (20:49 -0800)]
Fortran: Add user defined error messages for UDTIO.
The defines IOMSG_LEN and MSGLEN were redundant so these are combined
into IOMSG_LEN as defined in io.h.
The remainder of the patch adds checks for when a user defined
derived type IO procedure sets the IOSTAT or IOMSG variables
independent of the librrary defined I/O messages.
PR libfortran/105456
libgfortran/ChangeLog:
* io/io.h (IOMSG_LEN): Moved to here.
* io/list_read.c (MSGLEN): Removed MSGLEN.
(convert_integer): Changed MSGLEN to IOMSG_LEN.
(parse_repeat): Likewise.
(read_logical): Likewise.
(read_integer): Likewise.
(read_character): Likewise.
(parse_real): Likewise.
(read_complex): Likewise.
(read_real): Likewise.
(check_type): Likewise.
(list_formatted_read_scalar): Adjust to IOMSG_LEN.
(nml_read_obj): Add user defined error message.
* io/transfer.c (unformatted_read): Add user defined error
message.
(unformatted_write): Add user defined error message.
(formatted_transfer_scalar_read): Add user defined error message.
(formatted_transfer_scalar_write): Add user defined error message.
* io/write.c (list_formatted_write_scalar): Add user defined error message.
(nml_write_obj): Add user defined error message.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr105456-nmlr.f90: New test.
* gfortran.dg/pr105456-nmlw.f90: New test.
* gfortran.dg/pr105456-ruf.f90: New test.
* gfortran.dg/pr105456-wf.f90: New test.
* gfortran.dg/pr105456-wuf.f90: New test.
Patrick Palka [Wed, 6 Mar 2024 01:36:36 +0000 (20:36 -0500)]
c++/modules: befriending template from current class scope
Here the TEMPLATE_DECL representing the template friend declaration
naming B has class scope since the template B has class scope, but
get_merge_kind assumes all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL have namespace scope and wrongly returns MK_named instead
of MK_local_friend for the friend.
ctf: fix incorrect CTF for multi-dimensional array types
PR debug/114186
DWARF DIEs of type DW_TAG_subrange_type are linked together to represent
the information about the subsequent dimensions. The CTF processing was
so far working through them in the opposite (incorrect) order.
While fixing the issue, refactor the code a bit for readability.
gcc/
PR debug/114186
* dwarf2ctf.cc (gen_ctf_array_type): Invoke the ctf_add_array ()
in the correct order of the dimensions.
(gen_ctf_subrange_type): Refactor out handling of
DW_TAG_subrange_type DIE to here.
asan: Handle poly-int sizes in ASAN_MARK [PR97696]
This patch makes the expansion of IFN_ASAN_MARK let through
poly-int-sized objects. The expansion itself was already generic
enough, but the tests for the fast path were too strict.
gcc/
PR sanitizer/97696
* asan.cc (asan_expand_mark_ifn): Allow the length to be a poly_int.
gcc/testsuite/
PR sanitizer/97696
* gcc.target/aarch64/sve/pr97696.c: New test.
I was over-eager when adding support for strided SME2 instructions
and accidentally included forms of LUTI2 and LUTI4 that are only
available with SME2.1, not SME2. This patch removes them for now.
We're planning to add proper support for SME2.1 in the GCC 15
timeframe.
Sorry for the blunder :(
gcc/
* config/aarch64/aarch64.md (stride_type): Remove luti_consecutive
and luti_strided.
* config/aarch64/aarch64-sme.md
(@aarch64_sme_lut<LUTI_BITS><mode>): Remove stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): Delete.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-early-ra.cc (is_stride_candidate)
(early_ra::maybe_convert_to_strided_access): Remove support for
strided LUTI2 and LUTI4.
Register alloc may expand a 3-operand arithmetic X = Y o CST as
X = CST
X o= Y
where it may be better to instead:
X = Y
X o= CST
because 1) the first insn may use MOVW for "X = Y", and 2) the
operation may be more efficient when performed with a constant,
for example when ADIW or SBIW can be used, or some bytes of
the constant are 0x00 or 0xff.
gcc/
* config/avr/avr.md: Add two RTL peepholes for PLUS, IOR and AND
in HI, PSI, SI that swap operation order from "X = CST, X o= Y"
to "X = Y, X o= CST".
Roger Sayle [Tue, 5 Mar 2024 10:06:17 +0000 (11:06 +0100)]
AVR: Improve output of insn "*insv.any_shift.<mode>_split".
The instructions printed by insn "*insv.any_shift.<mode>_split" were
sub-optimal. The code to print the improved output is lengthy and
performed by new function avr_out_insv. As it turns out, the function
can also handle shift-offsets of zero, which is "*andhi3", "*andpsi3"
and "*andsi3". Thus, these tree insns get a new 3-operand alternative
where the 3rd operand is an exact power of 2.
gcc/
* config/avr/avr-protos.h (avr_out_insv): New proto.
* config/avr/avr.cc (avr_out_insv): New function.
(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle case.
(avr_cbranch_cost) [ZERO_EXTRACT]: Adjust rtx costs.
* config/avr/avr.md (define_attr "adjust_len") Add insv.
(andhi3, *andhi3, andpsi3, *andpsi3, andsi3, *andsi3):
Add constraint alternative where the 3rd operand is a power
of 2, and the source register may differ from the destination.
(*insv.any_shift.<mode>_split): Call avr_out_insv to output
instructions. Set attr "length" to "insv".
* config/avr/constraints.md (Cb2, Cb3, Cb4): New constraints.
gcc/testsuite/
* gcc.target/avr/torture/insv-anyshift-hi.c: New test.
* gcc.target/avr/torture/insv-anyshift-si.c: New test.
Richard Biener [Tue, 5 Mar 2024 09:55:56 +0000 (10:55 +0100)]
tree-optimization/114231 - use patterns for BB SLP discovery root stmts
The following makes sure to use recognized patterns when vectorizing
roots during BB SLP discovery. We need to apply those late since
during root discovery we've not yet done pattern recognition.
All parts of the vectorizer assume patterns get used, for the testcase
we mix this up when doing live lane computation.
PR tree-optimization/114231
* tree-vect-slp.cc (vect_analyze_slp): Lookup patterns when
processing a BB SLP root.
Jakub Jelinek [Tue, 5 Mar 2024 09:32:38 +0000 (10:32 +0100)]
lower-subreg: Fix ROTATE handling [PR114211]
On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
(rotate:TI (reg/v:TI 106 [ h ])
(const_int 64 [0x40]))) "pr114211.c":8:5 1042 {rotl64ti2_doubleword}
(nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move. The following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.
2024-03-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114211
* lower-subreg.cc (resolve_simple_move): For double-word
rotates by BITS_PER_WORD if there is overlap between source
and destination use a temporary.
Jakub Jelinek [Tue, 5 Mar 2024 09:27:14 +0000 (10:27 +0100)]
bitint: Handle BIT_FIELD_REF lowering [PR114157]
The following patch adds support for BIT_FIELD_REF lowering with
large/huge _BitInt lhs. BIT_FIELD_REF requires mode argument first
operand, so the operand shouldn't be any huge _BitInt.
If we only access limbs from inside of BIT_FIELD_REF using constant
indexes, we can just create a new BIT_FIELD_REF to extract the limb,
but if we need to use variable index in a loop, I'm afraid we need
to spill it into memory, which is what the following patch does.
If there is some bitwise type for the extraction, it extracts just
what we need and not more than that, otherwise it spills the whole
first argument of BIT_FIELD_REF and uses MEM_REF with an offset
with VIEW_CONVERT_EXPR around it.
2024-03-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114157
* gimple-lower-bitint.cc: Include stor-layout.h.
(mergeable_op): Return true for BIT_FIELD_REF.
(struct bitint_large_huge): Declare handle_bit_field_ref method.
(bitint_large_huge::handle_bit_field_ref): New method.
(bitint_large_huge::handle_stmt): Use it for BIT_FIELD_REF.
* gcc.dg/bitint-98.c: New test.
* gcc.target/i386/avx2-pr114157.c: New test.
* gcc.target/i386/avx512f-pr114157.c: New test.
Jakub Jelinek [Tue, 5 Mar 2024 09:24:51 +0000 (10:24 +0100)]
i386: For noreturn functions save at least the bp register if it is used [PR114116]
As mentioned in the PR, on x86_64 currently a lot of ICEs end up
with crashes in the unwinder like:
during RTL pass: expand
pr114044-2.c: In function ‘foo’:
pr114044-2.c:5:3: internal compiler error: in expand_fn_using_insn, at internal-fn.cc:208
5 | __builtin_clzg (a);
| ^~~~~~~~~~~~~~~~~~
0x7d9246 expand_fn_using_insn
../../gcc/internal-fn.cc:208
The problem are the PR38534 r14-8470 changes which avoid saving call-saved
registers in noreturn functions. If such functions ever touch the
bp register but because of the r14-8470 changes don't save it in the
prologue, the caller or any other function in the backtrace uses a frame
pointer and the noreturn function or anything it calls directly or
indirectly calls backtrace, then the unwinder crashes, because bp register
contains some unrelated value, but in the frames which do use frame pointer
CFA is based on the bp register.
In theory this could happen with any other call-saved register, e.g. code
written by hand in assembly with .cfi_* directives could use any other
call-saved register as register into which store the CFA or something
related to that, but in reality at least compiler generated code and usual
assembly probably just making sure bp doesn't contain garbage could be
enough for backtrace purposes. In the debugger of course it will not be
enough, the values of the arguments etc. can be lost (if DW_CFA_undefined
is emitted) or garbage.
So, I think for noreturn function we should at least save the bp register
if we use it. If user asks for it using no_callee_saved_registers
attribute, let's honor what is asked for (but then it is up to the user
to make sure e.g. backtrace isn't called from the function or anything it
calls). As discussed in the PR, whether to save bp or not shouldn't be
based on whether compiling with -g or -g0, because we don't want code
generation changes without/with debugging, it would also break
-fcompare-debug, and users can call backtrace(3), that doesn't use debug
info, just unwind info, even backtrace_symbols{,_fd}(3) don't use debug info
but just looks at dynamic symbol table.
The patch also adds check for no_caller_saved_registers
attribute in the implicit addition of not saving callee saved register
in noreturn functions, because on I think
__attribute__((no_caller_saved_registers, noreturn)) will otherwise
error that no_caller_saved_registers and no_callee_saved_registers
attributes are incompatible (but user didn't specify anything like that).
2024-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/114116
* config/i386/i386.h (enum call_saved_registers_type): Add
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP enumerator.
* config/i386/i386-options.cc (ix86_set_func_type): Remove
has_no_callee_saved_registers variable, add no_callee_saved_registers
instead, initialize it depending on whether it is
no_callee_saved_registers function or not. Don't set it if
no_caller_saved_registers attribute is present. Adjust users.
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Handle
TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP like
TYPE_NO_CALLEE_SAVED_REGISTERS.
(ix86_save_reg): Handle TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP.
* gcc.target/i386/pr38534-1.c: Allow push/pop of bp.
* gcc.target/i386/pr38534-4.c: Likewise.
* gcc.target/i386/pr38534-2.c: Likewise.
* gcc.target/i386/pr38534-3.c: Likewise.
* gcc.target/i386/pr114097-1.c: Likewise.
* gcc.target/i386/stack-check-17.c: Expect no pop on ! ia32.
Patrick Palka [Tue, 5 Mar 2024 02:32:44 +0000 (21:32 -0500)]
c++/modules: relax diagnostic about GMF contents
Issuing a hard error when the GMF doesn't consist only of preprocessing
directives happens to be inconvenient for automated testcase reduction
via cvise. This patch relaxes this diagnostic into a pedwarn that can
be disabled with -Wno-global-module.
gcc/c-family/ChangeLog:
* c.opt (Wglobal-module): New warning.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_translation_unit): Relax GMF contents
error into a pedwarn.
gcc/ChangeLog:
* doc/invoke.texi (-Wno-global-module): Document.
gcc/testsuite/ChangeLog:
* g++.dg/modules/friend-6_a.C: Pass -Wno-global-module instead
of -Wno-pedantic. Remove now unnecessary preprocessing
directives from GMF.
Nathaniel Shead [Sun, 3 Mar 2024 12:48:17 +0000 (23:48 +1100)]
c++: Support exporting using-decls in same namespace as target
Currently a using-declaration bringing a name into its own namespace is
a no-op, except for functions. This prevents people from being able to
redeclare a name brought in from the GMF as exported, however, which
this patch fixes.
Apart from marking declarations as exported they are also now marked as
effectively being in the module purview (due to the using-decl) so that
they are properly processed, as 'add_binding_entity' assumes that
declarations not in the module purview cannot possibly be exported.
gcc/cp/ChangeLog:
* name-lookup.cc (walk_module_binding): Remove completed FIXME.
(do_nonmember_using_decl): Mark redeclared entities as exported
when needed. Check for re-exporting internal linkage types.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-12.C: New test.
* g++.dg/modules/using-13.h: New test.
* g++.dg/modules/using-13_a.C: New test.
* g++.dg/modules/using-13_b.C: New test.
Gaius Mulley [Mon, 4 Mar 2024 21:46:32 +0000 (21:46 +0000)]
PR modula2/114227 InstallTerminationProcedure does not work with -fiso
This patch moves the initial/termination user procedure functionality in
pim and iso versions of M2RTS into M2Dependent. This ensures that
finalization/initialization procedures will always be invoked for both -fiso
and -fpim. Prior to this patch M2Dependent called M2RTS for
termination procedure cleanup and always invoked the pim M2RTS.
gcc/m2/ChangeLog:
PR modula2/114227
* gm2-libs-iso/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.
* gm2-libs/M2Dependent.def (InstallTerminationProcedure):
New procedure.
(ExecuteTerminationProcedures): New procedure.
(InstallInitialProcedure): New procedure.
(ExecuteInitialProcedures): New procedure.
* gm2-libs/M2Dependent.mod (ProcedureChain): New type.
(ProcedureList): New type.
(ExecuteReverse): New procedure.
(ExecuteTerminationProcedures): New procedure.
(ExecuteInitialProcedures): New procedure.
(AppendProc): New procedure.
(InstallTerminationProcedure): New procedure.
(InstallInitialProcedure): New procedure.
(InitProcList): New procedure.
* gm2-libs/M2RTS.mod (ProcedureChain): Remove.
(ProcedureList): Remove.
(ExecuteReverse): Remove.
(ExecuteTerminationProcedures): Rewrite.
(ExecuteInitialProcedures): Rewrite.
(AppendProc): Remove.
(InstallTerminationProcedure): Rewrite.
(InstallInitialProcedure): Rewrite.
(InitProcList): Remove.
I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.
David Faust [Mon, 4 Mar 2024 17:35:01 +0000 (09:35 -0800)]
bpf: add inline memset expansion
Similar to memmove and memcpy, the BPF backend cannot fall back on a
library call to implement __builtin_memset, and should always expand
calls to it inline if possible.
This patch implements simple inline expansion of memset in the BPF
backend in a verifier-friendly way. Similar to memcpy and memmove, the
size must be an integer constant, as is also required by clang.
gcc/
* config/bpf/bpf-protos.h (bpf_expand_setmem): New prototype.
* config/bpf/bpf.cc (bpf_expand_setmem): New.
* config/bpf/bpf.md (setmemdi): New define_expand.
gcc/testsuite/
* gcc.target/bpf/memset-1.c: New test.
On Mon, Mar 04, 2024 at 05:18:39PM +0100, Rainer Orth wrote:
> unfortunately, the patch broke Solaris/SPARC bootstrap
> (sparc-sun-solaris2.11):
>
> .../gcc/combine.cc: In function 'rtx_code simplify_comparison(rtx_code, rtx_def**, rtx_def**)':
> .../gcc/combine.cc:12101:25: error: '*(unsigned int*)((char*)&inner_mode + offsetof(scalar_int_mode, scalar_int_mode::m_mode))' may be used uninitialized [-Werror=maybe-uninitialized]
> 12101 | scalar_int_mode mode, inner_mode, tmode;
> | ^~~~~~~~~~
I don't see how it could ever work properly, inner_mode in that spot is
just uninitialized.
I think we shouldn't worry about paradoxical subregs of non-scalar_int_mode
REGs/MEMs and for the scalar_int_mode ones should initialize inner_mode
before we use it.
Another option would be to use
maybe_lt (GET_MODE_PRECISION (GET_MODE (SUBREG_REG (op0))), BITS_PER_WORD)
and
load_extend_op (GET_MODE (SUBREG_REG (op0))) == ZERO_EXTEND,
or set machine_mode smode = GET_MODE (SUBREG_REG (op0)); and use it in
those two spots.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/113010
* combine.cc (simplify_comparison): Guard the
WORD_REGISTER_OPERATIONS check on scalar_int_mode of SUBREG_REG
and initialize inner_mode.
Andre Vieira [Wed, 14 Feb 2024 16:46:38 +0000 (16:46 +0000)]
arm: Annotate instructions with mve_safe_imp_xlane_pred
This patch annotates some MVE across lane instructions with a new attribute.
We use this attribute to let the compiler know that these instructions can be
safely implicitly predicated when tail predicating if their operands are
guaranteed to have zeroed tail predicated lanes. These instructions were
selected because having the value 0 in those lanes or 'tail-predicating' those
lanes have the same effect.
arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns
This patch adds an attribute to the mve md patterns to be able to identify
predicable MVE instructions and what their predicated and unpredicated variants
are. This attribute is used to encode the icode of the unpredicated variant of
an instruction in its predicated variant.
This will make it possible for us to transform VPT-predicated insns in
the insn chain into their unpredicated equivalents when transforming the loop
into a MVE Tail-Predicated Low Overhead Loop. For example:
`mve_vldrbq_z_<supf><mode> -> mve_vldrbq_<supf><mode>`.
Richard Biener [Mon, 4 Mar 2024 12:28:34 +0000 (13:28 +0100)]
tree-optimization/114197 - unexpected if-conversion for vectorization
The following avoids lowering a volatile bitfiled access and in case
the if-converted and original loops end up in different outer loops
because of simplifcations enabled scrap the result since that is not
how the vectorizer expects the loops to be laid out.
PR tree-optimization/114197
* tree-if-conv.cc (bitfields_to_lower_p): Do not lower if
there are volatile bitfield accesses.
(pass_if_conversion::execute): Throw away result if the
if-converted and original loops are not nested as expected.
The following avoids creating unsupported VEC_COND_EXPRs as part of
SIMD clone call mask argument setup during vectorization which results
in inefficient decomposing of the operation during vector lowering.
PR tree-optimization/114164
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Fail if
the code generated for mask argument setup is not supported.
Jakub Jelinek [Mon, 4 Mar 2024 10:40:27 +0000 (11:40 +0100)]
libgomp: Use void (*) (void *) rather than void (*)() for host_fn type [PR114216]
For the type of the target callbacks we use elsehwere void (*) (void *) and
IMHO should use that for the reverse offload fallback as well (where the actual
callback is emitted using the same code as for host fallback or device kernel
entry routines), even when it is also ok to use void (*) () before C23 and
we aren't building libgomp with C23 yet. On some arches perhaps void (*) ()
could result in worse code generation because calls in that case like casts
to unprototyped functions need to sometimes pass argument in two different spots
etc. so that it deals with both passing it through ... and as a named argument.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR libgomp/114216
* target.c (gomp_target_rev): Change host_fn type and corresponding
cast from void (*)() to void (*) (void *).
For precision less than int we apply the adjustment to make it defined
at zero after the adjustment to make it compute CLZ rather than CTZ.
That's wrong.
PR tree-optimization/114203
* tree-ssa-loop-niter.cc (build_cltz_expr): Apply CTZ->CLZ
adjustment before making the result defined at zero.
Richard Biener [Mon, 4 Mar 2024 08:46:13 +0000 (09:46 +0100)]
tree-optimization/114192 - scalar reduction kept live with early break vect
The following fixes a missing replacement of the reduction value
used in the epilog, causing the scalar reduction to be kept live
across the early break exit path.
PR tree-optimization/114192
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Use the
appropriate def for the live out stmt in case of an alternate
exit.
Jakub Jelinek [Mon, 4 Mar 2024 10:15:07 +0000 (11:15 +0100)]
bitint: Fix tree node sharing bug [PR114209]
We ICE on the following testcase due to invalid tree sharing.
The second hunk fixes that, the first one is from me looking around at
other spots which might need end up with invalid tree sharing too.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114209
* gimple-lower-bitint.cc (bitint_large_huge::limb_access): Call
unshare_expr when creating a MEM_REF from MEM_REF.
(bitint_large_huge::lower_stmt): Call unshare_expr.
Xi Ruoyao [Tue, 23 Jan 2024 11:58:21 +0000 (19:58 +0800)]
testsuite: Make pr104992.c irrelated to target vector feature [PR113418]
The vect_int_mod target selector is evaluated with the options in
DEFAULT_VECTCFLAGS in effect, but these options are not automatically
passed to tests out of the vect directories. So this test fails on
targets where integer vector modulo operation is supported but requiring
an option to enable, for example LoongArch.
In this test case, the only expected optimization not happened in
original is in corge because it needs forward propogation. So we can
scan the forwprop2 dump (where the vector operation is not expanded to
scalars yet) instead of optimized, then we don't need to consider
vect_int_mod or not.
gcc/testsuite/ChangeLog:
PR testsuite/113418
* gcc.dg/pr104992.c (dg-options): Use -fdump-tree-forwprop2
instead of -fdump-tree-optimized.
(dg-final): Scan forwprop2 dump instead of optimized, and remove
the use of vect_int_mod.
* lib/target-supports.exp (check_effective_target_vect_int_mod):
Remove because it's not used anymore.
Jakub Jelinek [Mon, 4 Mar 2024 09:04:19 +0000 (10:04 +0100)]
i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]
The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.
As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.
The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/114184
* config/i386/i386-expand.cc (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.
Roger Sayle [Mon, 4 Mar 2024 00:47:08 +0000 (00:47 +0000)]
PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg.
This patch fixes PR target/114187 a typo/missed-optimization in simplify-rtx
that's exposed by (my) changes to x86_64's parameter passing. The context
is that construction of double word (TImode) values now uses the idiom:
Extracting the DImode highpart and lowpart halves of this complex expression
is supported by simplications in simplify_subreg. The problem is when the
doubleword TImode value represents two DFmode fields, there isn't a direct
simplification to extract the highpart or lowpart SUBREGs, but instead GCC
uses two steps, extract the DImode {high,low} part and then cast the result
back to a floating point mode, DFmode.
The (buggy) code to do this is:
/* If the outer mode is not integral, try taking a subreg with the equivalent
integer outer mode and then bitcasting the result.
Other simplifications rely on integer to integer subregs and we'd
potentially miss out on optimizations otherwise. */
if (known_gt (GET_MODE_SIZE (innermode),
GET_MODE_SIZE (outermode))
&& SCALAR_INT_MODE_P (innermode)
&& !SCALAR_INT_MODE_P (outermode)
&& int_mode_for_size (GET_MODE_BITSIZE (outermode),
0).exists (&int_outermode))
{
rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
if (tem)
return simplify_gen_subreg (outermode, tem, int_outermode, byte);
}
The issue/mistake is that the second call, to simplify_subreg, shouldn't
use "byte" as the final argument; the offset has already been handled by
the first call, to simplify_subreg, and this second call is just a type
conversion from an integer mode to floating point (from DImode to DFmode).
Interestingly, this mistake was already spotted by Richard Sandiford when
the optimization was originally contributed in January 2023.
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html
>> Richard Sandiford writes:
>> Also, the final line should pass 0 rather than byte.
Unfortunately a miscommunication/misunderstanding in a later thread
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612898.html
resulted in this correction being undone. Using lowpart_subreg should
avoid/reduce confusion in future.
2024-03-03 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/114187
* simplify-rtx.cc (simplify_context::simplify_subreg): Call
lowpart_subreg to perform type conversion, to avoid confusion
over the offset to use in the call to simplify_reg_subreg.
gcc/testsuite/ChangeLog
PR target/114187
* g++.target/i386/pr114187.C: New test case.
Greg McGary [Sun, 3 Mar 2024 21:49:49 +0000 (14:49 -0700)]
[PATCH] combine: Don't simplify paradoxical SUBREG on WORD_REGISTER_OPERATIONS [PR113010]
The sign-bit-copies of a sign-extending load cannot be known until runtime on
WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM
load. See the fix for PR112758.
gcc/
PR rtl-optimization/113010
* combine.cc (simplify_comparison): Simplify a SUBREG on
WORD_REGISTER_OPERATIONS targets only if it is a zero-extending
MEM load.
gcc/testsuite
* gcc.c-torture/execute/pr113010.c: New test.
gcc/
* config/avr/avr.cc: Resolve ATTRIBUTE_UNUSED.
Use bool in place of int for boolean logic (if possible).
Move declarations to definitions (if possible).
* config/avr/avr.md: Use C++ comments. Fix some indentation glitches.
* config/avr/avr-dimode.md: Same.
* config/avr/constraints.md: Same.
* config/avr/predicates.md: Same.
AVR: ad target/114100 - Don't print unused frame pointer adjustments.
Without -mfuse-add, when fake reg+offset addressing is used, the
output routines are saving some instructions when the base reg
is unused after. This patch adds that optimization for the case
when the base is the frame pointer and the frame pointer adjustments
are split away from the move insn by -mfuse-add in .split2.
Direct usage of reg_unused_after is not possible because that
function looks at the destination of the current insn, which won't
work for offsetting the frame pointer in printing PLUS code.
It can use an extended version of _reg_unused_after though.
gcc/
PR target/114100
* config/avr/avr-protos.h (_reg_unused_after): Remove proto.
* config/avr/avr.cc (_reg_unused_after): Make static. And
add 3rd argument to skip the current insn.
(reg_unused_after): Adjust call of reg_unused_after.
(avr_out_plus_1) [AVR_TINY && -mfuse-add >= 2]: Don't output
unneeded frame pointer adjustments.
AVR: ad target/92792 - Remove insn attribute "cc" and its (dead) uses.
The backend has remains of cc0 condition code. Unfortunately,
all that information is useless with CCmode, and their use was
removed with the removal of NOTICE_UPDATE_CC in PR92729 with
r12-226 and r12-327.
gcc/
PR target/92729
* config/avr/avr.md (define_attr "cc"): Remove.
* config/avr/avr-protos.h (avr_out_plus): Remove pcc argument
from prototype.
* config/avr/avr.cc (avr_out_plus_1): Remove pcc argument and
its uses. Add insn argument.
(avr_out_plus_symbol): Remove pcc argument and its uses.
(avr_out_plus): Remove pcc argument and its uses.
Adjust calls of avr_out_plus_symbol and avr_out_plus_1.
(avr_out_round): Adjust call of avr_out_plus.
AVR: Use REG_<n> constants instead of magic numbers <n>.
There are some places where avr.cc uses magic numbers like 17 that
are actually register numbers. This patch defines constants like
REG_17 and uses them instead of the magic numbers when a register
number is meant.
gcc/
* config/avr/avr.md (REG_0, ... REG_36): New define_constants.
* config/avr/avr.cc: Use them instead of magic numbers when it
means a register number.
Patrick Palka [Fri, 1 Mar 2024 22:24:15 +0000 (17:24 -0500)]
c++/modules: depending local enums [PR104919, PR106009]
For local enums defined in a non-template function or a function template
instantiation it seems we neglect to make the function depend on the enum
definition (which modules considers logically separate), which ultimately
causes the enum definition to not be properly streamed before uses
within the function definition are streamed.
The code responsible for noting such dependencies is
and the condition there notably excludes local TYPE_DECLs from a
non-template-pattern function (when streaming a template pattern
we'll see be dealing with the corresponding TEMPLATE_DECL of the
local TYPE_DECL here, so we'll add the dependency).
Local classes on the other hand seem to work properly, but perhaps by
accident: with a local class we end up making the function depend on the
injected-class-name of the local class rather than the local class as a
whole because the injected-class-name satisfies the criteria (since its
context is the local class, not the function).
The 'sneakoscope' flag is set when walking a function declaration and
its purpose seems to be to catch a local type that escapes the function
via a deduced return type (so called voldemort types) and note a
dependency on them. But there seems to be no reason to restrict this
behavior to voldemort types, and indeed consistently noting the dependency
for all local types fixes these PRs (almost). So this patch gets rid of
this flag and enables the dependency tracking unconditionally.
This was nearly enough to make things work, except we now ran into
issues with the local TYPE_/CONST_DECL copies from the pre-gimplified
version of a constexpr function body during streaming. Rather than
making modules cope with this, it occurred to me that we don't need to
make copies of local types when saving the pre-gimplified body (and when
making further copies thereof); only VAR_DECLs etc need to be copied
(so that we don't conflate local variables from different recursive
calls to the same function during constexpr evaluation). So this patch
adjusts copy_fn accordingly.
PR c++/104919
PR c++/106009
gcc/cp/ChangeLog:
* module.cc (depset::hash::sneakoscope): Remove.
(trees_out::decl_node): Always add a dependency on a local type.
(depset::hash::find_dependencies): Remove sneakoscope stuff.
gcc/ChangeLog:
* tree-inline.cc (remap_decl): Handle copy_decl returning the
original decl.
(remap_decls): Handle remap_decl returning the original decl.
(copy_fn): Adjust copy_decl callback to skip TYPE_DECL and
CONST_DECL.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tdef-7.h: Remove outdated comment.
* g++.dg/modules/tdef-7_b.C: Don't expect two TYPE_DECLs.
* g++.dg/modules/enum-13_a.C: New test.
* g++.dg/modules/enum-13_b.C: New test.
Nathaniel Shead [Fri, 1 Mar 2024 00:08:23 +0000 (11:08 +1100)]
c++: Stream definitions for implicit instantiations [PR114170]
An implicit instantiation has an initializer depending on whether
DECL_INITIALIZED_P is set (like normal VAR_DECLs) which needs to be
written to ensure that consumers of header modules properly emit
definitions for these instantiations. This patch ensures that we
correctly fallback to checking this flag when DECL_INITIAL is not set
for a template instantiation.
For variables with non-trivial dynamic initialization, DECL_INITIAL can
be empty after 'split_nonconstant_init' but DECL_INITIALIZED_P is still
set; we need to check the latter to determine if we need to go looking
for a definition to emit (often in 'static_aggregates' here). This is
the case in the linked testcase.
However, for template specialisations (not instantiations?) we primarily
care about DECL_INITIAL; if the variable has initialization depending on
a template parameter then we'll need to emit that definition even though
it doesn't yet have DECL_INITIALIZED_P set; this is the case in e.g.
template <int N> int value = N;
As a drive-by fix, also ensures that the count of initializers matches
the actual number of initializers written. This doesn't seem to be
necessary for correctness in the current testsuite, but feels wrong and
makes debugging harder when initializers aren't properly written for
other reasons.
PR c++/114170
gcc/cp/ChangeLog:
* module.cc (has_definition): Fall back to DECL_INITIALIZED_P
when DECL_INITIAL is not set on a template.
(module_state::write_inits): Only increment count when
initializers are actually written.
gcc/testsuite/ChangeLog:
* g++.dg/modules/var-tpl-2_a.H: New test.
* g++.dg/modules/var-tpl-2_b.C: New test.
Nathaniel Shead [Thu, 29 Feb 2024 11:49:13 +0000 (22:49 +1100)]
c++: Ensure DECL_CONTEXT is set for temporary vars [PR114005]
Modules streaming requires DECL_CONTEXT to be set for anything streamed.
This patch ensures that 'create_temporary_var' does set a DECL_CONTEXT
for these variables (such as the backing storage for initializer_lists)
even if not inside a function declaration.
PR c++/114005
gcc/cp/ChangeLog:
* init.cc (create_temporary_var): Use current_scope instead of
current_function_decl.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114005_a.C: New test.
* g++.dg/modules/pr114005_b.C: New test.
Jeff Law [Fri, 1 Mar 2024 21:54:04 +0000 (14:54 -0700)]
[14 regression] Fix insn types in risc-v port
So one of the broad goals we've had over the last few months has been to ensure
that every insn has a scheduling type and that every insn is associated with an
insn reservation in the scheduler.
This avoids some amazingly bad behavior in the scheduler. I won't go through
the gory details.
I was recently analyzing a code quality regression with dhrystone (ugh!) and
one of the issues was poor scheduling which lengthened the lifetime of a pseudo
and ultimately resulted in needing an additional callee saved register
save/restore.
This was ultimately tracked down incorrect types on a few patterns. So I did
an audit of all the patterns that had types added/changed as part of this
effort and found a variety of problems, primarily in the various move patterns
and extension patterns. This is a regression relative to gcc-13.
Naturally the change in types affects scheduling, which in turn changes the
precise code we generate and causes some testsuite fallout.
I considered updating the regexps since the change in the resulting output is
pretty consistent. But of course the test would still be sensitive to things
like load latency. So instead I just turned off the 2nd phase scheduler in the
affected tests.
Patrick Palka [Fri, 1 Mar 2024 21:50:20 +0000 (16:50 -0500)]
c++/modules: complete_vars ICE with non-exported constexpr var
Here after stream-in of the non-exported constexpr global 'A a' we call
maybe_register_incomplete_var, which we'd expect to be a no-op here but
it manages to take its second branch and pushes {a, NULL_TREE} onto
incomplete_vars. Later after defining B we ICE from complete_vars due
to this pushed NULL_TREE class context.
Judging by the two commits that introduced/modified this part of
maybe_register_incomplete_var, r196852 and r214333, it seems this second
branch is only concerned with constexpr static data members (whose
initializer may contain a pointer-to-member for a not-yet-complete class)
So this patch restricts this branch accordingly so it's not inadvertently
taken during stream-in.
gcc/cp/ChangeLog:
* decl.cc (maybe_register_incomplete_var): Restrict second
branch to static data members from a not-yet-complete class.
gcc/testsuite/ChangeLog:
* g++.dg/modules/cexpr-4_a.C: New test.
* g++.dg/modules/cexpr-4_b.C: New test.
Marek Polacek [Thu, 25 Jan 2024 21:38:51 +0000 (16:38 -0500)]
c++: implement [[gnu::no_dangling]] [PR110358]
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site. But #pragma GCC diagnostic tend
to be onerous. A better solution would be to have an attribute.
To that end, this patch adds a new attribute, [[gnu::no_dangling]].
This attribute takes an optional bool argument to support cases like:
template <typename T>
struct [[gnu::no_dangling(std::is_reference_v<T>)]] S {
// ...
};
PR c++/110358
PR c++/109642
gcc/cp/ChangeLog:
* call.cc (no_dangling_p): New.
(reference_like_class_p): Use it.
(do_warn_dangling_reference): Use it. Don't warn when the function
or its enclosing class has attribute gnu::no_dangling.
* tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
(handle_no_dangling_attribute): New.
* g++.dg/ext/attr-no-dangling1.C: New test.
* g++.dg/ext/attr-no-dangling2.C: New test.
* g++.dg/ext/attr-no-dangling3.C: New test.
* g++.dg/ext/attr-no-dangling4.C: New test.
* g++.dg/ext/attr-no-dangling5.C: New test.
* g++.dg/ext/attr-no-dangling6.C: New test.
* g++.dg/ext/attr-no-dangling7.C: New test.
* g++.dg/ext/attr-no-dangling8.C: New test.
* g++.dg/ext/attr-no-dangling9.C: New test.
David Faust [Fri, 1 Mar 2024 18:43:24 +0000 (10:43 -0800)]
testsuite: ctf: make array in ctf-file-scope-1 fixed length
The array member of struct SFOO in the ctf-file-scope-1 caused the test
to fail for the BPF target, since BPF does not support dynamic stack
allocation. The array does not need to variable length for the sake of
the test, so make it fixed length instead to allow the test to run
successfully for the bpf-unknown-none target.
gcc/testsuite/
* gcc.dg/debug/ctf/ctf-file-scope-1.c (SFOO): Make array member
fixed-length.
Harald Anlauf [Fri, 1 Mar 2024 18:21:27 +0000 (19:21 +0100)]
Fortran: improve checks of NULL without MOLD as actual argument [PR104819]
gcc/fortran/ChangeLog:
PR fortran/104819
* check.cc (gfc_check_null): Handle nested NULL()s.
(is_c_interoperable): Check for MOLD argument of NULL() as part of
the interoperability check.
* interface.cc (gfc_compare_actual_formal): Extend checks for NULL()
actual arguments for presence of MOLD argument when required by
Interp J3/22-146.
gcc/testsuite/ChangeLog:
PR fortran/104819
* gfortran.dg/assumed_rank_9.f90: Adjust testcase use of NULL().
* gfortran.dg/pr101329.f90: Adjust testcase to conform to interp.
* gfortran.dg/null_actual_4.f90: New test.
In r12-6773-g09845ad7569bac we gave CTAD placeholders a level of 0 and
ensured we never replaced them via tsubst. It turns out that autos
representing an explicit cast need the same treatment and for the same
reason: such autos appear in an expression context and so their level
gets easily messed up after partial substitution, leading to premature
replacement via an incidental tsubst instead of via do_auto_deduction.
This patch fixes this by extending the r12-6773 approach to auto(x).
PR c++/110025
PR c++/114138
gcc/cp/ChangeLog:
* cp-tree.h (make_cast_auto): Declare.
* parser.cc (cp_parser_functional_cast): If the type is an auto,
replace it with a level-less one via make_cast_auto.
* pt.cc (find_parameter_packs_r): Don't treat level-less auto
as a type parameter pack.
(tsubst) <case TEMPLATE_TYPE_PARM>: Generalize CTAD placeholder
auto handling to all level-less autos.
(make_cast_auto): Define.
(do_auto_deduction): Handle replacement of a level-less auto.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-fncast16.C: New test.
* g++.dg/cpp23/auto-fncast17.C: New test.
* g++.dg/cpp23/auto-fncast18.C: New test.
Jakub Jelinek [Fri, 1 Mar 2024 15:59:08 +0000 (16:59 +0100)]
c++: Fix up decltype of non-dependent structured binding decl in template [PR92687]
finish_decltype_type uses DECL_HAS_VALUE_EXPR_P (expr) check for
DECL_DECOMPOSITION_P (expr) to determine if it is
array/struct/vector/complex etc. subobject proxy case vs. structured
binding using std::tuple_{size,element}.
For non-templates or when templates are already instantiated, that works
correctly, finalized DECL_DECOMPOSITION_P non-base vars indeed have
DECL_VALUE_EXPR in the former case and don't have it in the latter.
It works fine for dependent structured bindings as well, cp_finish_decomp in
that case creates DECLTYPE_TYPE tree and defers the handling until
instantiation.
As the testcase shows, this doesn't work for the non-dependent structured
binding case in templates, because DECL_HAS_VALUE_EXPR_P is set in that case
always; cp_finish_decomp ends with:
if (processing_template_decl)
{
for (unsigned int i = 0; i < count; i++)
if (!DECL_HAS_VALUE_EXPR_P (v[i]))
{
tree a = build_nt (ARRAY_REF, decl, size_int (i),
NULL_TREE, NULL_TREE);
SET_DECL_VALUE_EXPR (v[i], a);
DECL_HAS_VALUE_EXPR_P (v[i]) = 1;
}
}
and those artificial ARRAY_REFs are used in various places during
instantiation to find out what base the DECL_DECOMPOSITION_P VAR_DECLs
have and their positions.
The following patch fixes that by changing lookup_decomp_type, such that
it doesn't ICE when called on a DECL_DECOMPOSITION_P var which isn't in a
hash table, but returns NULL_TREE in that case, and for processing_template_decl
asserts DECL_HAS_VALUE_EXPR_P is non-NULL and just calls lookup_decomp_type.
If it returns non-NULL, it is a structured binding using tuple and its result
is returned, otherwise it falls through to returning unlowered_expr_type (expr)
because it is an array, structure etc. subobject proxy.
For !processing_template_decl it keeps doing what it did before,
DECL_HAS_VALUE_EXPR_P meaning it is an array/structure etc. subobject proxy,
otherwise the tuple case.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR c++/92687
* decl.cc (lookup_decomp_type): Return NULL_TREE if decomp_type_table
doesn't have entry for V.
* semantics.cc (finish_decltype_type): If ptds.saved, assert
DECL_HAS_VALUE_EXPR_P is true and decide on tuple vs. non-tuple based
on if lookup_decomp_type is NULL or not.
Jakub Jelinek [Fri, 1 Mar 2024 16:26:42 +0000 (17:26 +0100)]
OpenMP/C++: Fix (first)private clause with member variables [PR110347]
OpenMP permits '(first)private' for C++ member variables, which GCC handles
by tagging those by DECL_OMP_PRIVATIZED_MEMBER, adding a temporary VAR_DECL
and DECL_VALUE_EXPR pointing to the 'this->member_var' in the C++ front end.
The idea is that in omp-low.cc, the DECL_VALUE_EXPR is used before the
region (for 'firstprivate'; ignored for 'private') while in the region,
the DECL itself is used.
In gimplify, the value expansion is suppressed and deferred if the
lang_hooks.decls.omp_disregard_value_expr (decl, shared)
returns true - which is never the case if 'shared' is true. In OpenMP 4.5,
only 'map' and 'use_device_ptr' was permitted for the 'target' directive.
And when OpenMP 5.0's 'private'/'firstprivate' clauses was added, the
the update that now 'shared' argument could be false was missed. The
respective check has now been added.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
Tobias Burnus <tburnus@baylibre.com>
PR c++/110347
gcc/ChangeLog:
* gimplify.cc (omp_notice_variable): Fix 'shared' arg to
lang_hooks.decls.omp_disregard_value_expr for
(first)private in target regions.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-lambda-3.C: Moved from
gcc/testsuite/g++.dg/gomp/ and fixed is-mapped handling.
* testsuite/libgomp.c++/target-lambda-1.C: Modify to also
also work without offloading.
* testsuite/libgomp.c++/firstprivate-1.C: New test.
* testsuite/libgomp.c++/firstprivate-2.C: New test.
* testsuite/libgomp.c++/private-1.C: New test.
* testsuite/libgomp.c++/private-2.C: New test.
* testsuite/libgomp.c++/target-lambda-4.C: New test.
* testsuite/libgomp.c++/use_device_ptr-1.C: New test.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/target-lambda-1.C: Moved to become a
run-time test under testsuite/libgomp.c++.
Jakub Jelinek [Fri, 1 Mar 2024 14:42:52 +0000 (15:42 +0100)]
calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR114136]
On Tue, Feb 27, 2024 at 04:41:32PM +0000, Richard Earnshaw wrote:
> On Arm the PR107453 change is causing all anonymous arguments to be passed on the
> stack, which is incorrect per the ABI. On a target that uses
> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
> zero? Is it enough to guard both the statements you've added with
> !targetm.calls.pretend_outgoing_args_named?
The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
it, but it should be handled as if it was non-NULL but list length was 0.
So, for the
if (type_arg_types != 0)
n_named_args
= (list_length (type_arg_types)
/* Count the struct value address, if it is passed as a parm. */
+ structure_value_addr_parm);
else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = 0;
else
/* If we know nothing, treat all args as named. */
n_named_args = num_actuals;
case, I think guarding it by any target hooks is wrong, although
I guess it should have been
n_named_args = structure_value_addr_parm;
instead of
n_named_args = 0;
For the second
if (type_arg_types != 0
&& targetm.calls.strict_argument_naming (args_so_far))
;
else if (type_arg_types != 0
&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
/* Don't include the last named arg. */
--n_named_args;
else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = 0;
else
/* Treat all args as named. */
n_named_args = num_actuals;
I think we should treat those as if type_arg_types was non-NULL
with 0 elements in the list, except the --n_named_args would for
!structure_value_addr_parm lead to n_named_args = -1, I think we want
0 for that case.
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114136
* calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
n_named_args initially before INIT_CUMULATIVE_ARGS to
structure_value_addr_parm rather than 0, after it don't modify
it if strict_argument_naming and clear only if
!pretend_outgoing_varargs_named.
Jakub Jelinek [Fri, 1 Mar 2024 13:57:15 +0000 (14:57 +0100)]
dwarf2out: Don't move variable sized aggregates to comdat [PR114015]
The following testcase ICEs, because we decide to move that
struct { char a[n]; } DW_TAG_structure_type into .debug_types section
/ DW_UT_type DWARF5 unit, but refer from there to a DW_TAG_variable
(created artificially for the array bounds).
Even with non-bitint, I think it is just wrong to use .debug_types
section / DW_UT_type for something that uses DW_OP_fbreg and similar
in it, things clearly dependent on a particular function.
In most cases, is_nested_in_subprogram (die) check results in such
aggregates not being moved, but in the function parameter type case
that is not the case.
The following patch fixes it by returning false from should_move_die_to_comdat
for non-constant sized aggregate types, i.e. when either we gave up on
adding DW_AT_byte_size for it because it wasn't expressable, or when
it is something non-constant (location description, reference, ...).
2024-03-01 Jakub Jelinek <jakub@redhat.com>
PR debug/114015
* dwarf2out.cc (should_move_die_to_comdat): Return false for
aggregates without DW_AT_byte_size attribute or with non-constant
DW_AT_byte_size.
Richard Biener [Thu, 29 Feb 2024 08:22:19 +0000 (09:22 +0100)]
middle-end/114070 - VEC_COND_EXPR folding
The following amends the PR114070 fix to optimistically allow
the folding when we cannot expand the current vec_cond using
vcond_mask and we're still before vector lowering. This leaves
a small window between vectorization and lowering where we could
break vec_conds that can be expanded via vcond{,u,eq}, most
susceptible is the loop unrolling pass which applies VN and thus
possibly folding to the unrolled body of a vectorized loop.
This gets back the folding for targets that cannot do vectorization.
It doesn't get back the folding for x86 with AVX512 for example
since that can handle the original IL but not the folded since
it misses some vcond_mask expanders.
PR middle-end/114070
* match.pd ((c ? a : b) op d --> c ? (a op d) : (b op d)):
Allow the folding if before lowering and the current IL
isn't supported with vcond_mask.