xuli [Fri, 13 Dec 2024 04:28:48 +0000 (04:28 +0000)]
RISC-V: Add testcases for unsigned imm vec SAT_SUB form2~4
Form2:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
out[i] = in[i] >= (T)IMM ? in[i] - (T)IMM : 0; \
}
Form3:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
out[i] = (T)IMM > in[i] ? (T)IMM - in[i] : 0; \
}
Form4:
void __attribute__((noinline)) \
vec_sat_u_sub_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
out[i] = in[i] > (T)IMM ? in[i] - (T)IMM : 0; \
}
Passed the rv64gcv full regression test.
Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/sat/vec_sat_arith.h: add unsigned imm vec sat_sub form2~4.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_data.h: add data for vec sat_sub.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-3-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-4-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-2-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-3-u8.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u16.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u32.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u64.c: New test.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_imm-run-4-u8.c: New test.
Guo Jie [Mon, 30 Dec 2024 02:37:18 +0000 (10:37 +0800)]
LoongArch: Fix selector error in lasx_xvexth_h/w/d* patterns
The xvexth related instructions operate SEPARATELY according to
the high and low 128 bits, and sign/zero extend the upper half
of every 128 bits in src to the corresponding 128 bits in dest.
For xvexth.d.w, the rule for the first element of dest should be:
dest.D[0] = sign_extend (src.W[2] ,64);
instead of:
dest.D[0] = sign_extend (src.W[4] ,64);
Sandra Loosemore [Sat, 28 Dec 2024 03:51:50 +0000 (03:51 +0000)]
Fortran: Fix that/which usage in the manual.
In English usage, "that" introduces a restrictive clause while "which"
introduces a non-restrictive or descriptive clause. "That" is almost
never preceded by a comma while "which" often is. The Fortran manual
had many instances where these uses were reversed, or where a comma
was used with "that"; this patch fixes them. In some cases I have
substituted less convoluted wording instead.
gcc/fortran/ChangeLog
* gfortran.texi: Clean up that/which usage throughout the file.
* intrinsic.texi: Likewise.
* invoke.texi: Likewise.
Sandra Loosemore [Fri, 27 Dec 2024 20:45:14 +0000 (20:45 +0000)]
Fortran: Grammar/markup fixes in intrinsics documentation
Continuing a series of patches to tidy the Fortran manual, this
installment fixes problems with inappropriate use of future tense and
adds some missing markup I noticed in passing.
gcc/fortran/ChangeLog
* intrinsic.texi: Grammar and markup fixes throughout
the file.
Per comments in invoke.texi, target option groups in the Option
Summary section are supposed to be alphabetized and in the same order
as the documentation sections they refer to. "M32C Options" was
misordered in the Option Summary. "Cygwin and MinGW Options" was
ordered incorrectly in both places, which also caused Texinfo
diagnostics because the ordering in the menu (which was correctly
alphabetized) didn't match the node order.
I also added a reference to the appropriate section to each entry in
the Option Summary so that you can go directly to the detailed
description for that set of target options. I'm not real happy with
the formatting of the tables in that section but the experiments I
tried all looked worse. :-(
gcc/ChangeLog
* doc/invoke.texi (Option Summary): Put "M32C Options" and
"Cygwin and MinGW Options" in alphabetical order. Add
cross-references.
(Cygwin and MinGW Options): Likewise move the section to its
correct alphabetical location.
* config/lynx.opt.urls: Regenerated.
* config/mingw/cygming.opt.urls: Regenerated.
Jiahao Xu [Wed, 25 Dec 2024 09:59:36 +0000 (17:59 +0800)]
LoongArch: Implement vector cbranch optab for LSX and LASX
In order to support vectorization of loops with multiple exits, this
patch adds the implementation of the conditional branch optab for
LoongArch LSX/LASX instructions.
This patch causes the gen-vect-{2,25}.c tests to fail. This is because
the support for vectorizing loops with multiple exits has vectorized
the loop checking the results. The failure is due to an issue in the
test case's own implementation.
gcc/ChangeLog:
* config/loongarch/simd.md (cbranch<mode>4): New expander.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_vect_early_break_hw,
check_effective_target_vect_early_break): Support LoongArch LSX.
* gcc.target/loongarch/vector/lasx/lasx-vseteqz.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-vseteqz.c: New test.
Co-authored-by: Deng Jianbo <dengjianbo@loongson.cn>
Robin Dapp [Tue, 31 Dec 2024 06:47:53 +0000 (23:47 -0700)]
[PATCH v2] varasm: Use native_encode_rtx for constant vectors.
optimize_constant_pool hashes vector masks by native_encode_rtx and
merges identically hashed values in the constant pool. Afterwards the
optimized values are written in output_constant_pool_2.
However, native_encode_rtx and output_constant_pool_2 disagree in their
encoding of vector masks: native_encode_rtx does not pad with zeroes
while output_constant_pool_2 implicitly does.
In RVV's shuffle-evenodd-run.c there are two masks
(a) "0101" for V4BI
(b) "01010101" for V8BI and
that have the same representation/encoding ("1010101") in native_encode_rtx.
output_constant_pool_2 uses "101" for (a) and "1010101" for (b).
Now, optimize_constant_pool might happen to merge both masks using
(a) as representative. Then, output_constant_pool_2 will output "1010"
which is only valid for the second mask as the implicit zero padding
doesn't agree with (b).
(b)'s "1010101" works for both masks as a V4BI load will ignore the last four
padding bits.
This patch makes output_constant_pool_2 use native_encode_rtx so both
functions will agree on an encoding and output the correct constant.
PR target/118036
gcc/ChangeLog:
* varasm.cc (output_constant_pool_2): Use native_encode_rtx for
building the memory image of a const vector mask.
Several months ago changes were made to the vectorizer which mucked up several
of the scan tests. All but one of the cases in pr115375 have since been fixed.
The remaining failure seems to be primarily a debugging dump issue -- we're
still selecting the same lmul values. This patch adjusts the dump scan
appropriately.
Jeff Law [Mon, 30 Dec 2024 23:14:29 +0000 (16:14 -0700)]
[PR testsuite/114182] Fix minor testsuite issue when double == float
This is a minor testsuite adjustment
attr-complex-method-2.c selects between two scan-tree-dump clauses based on
avr, !avr. But what they really should be checking is "large_double" that way
it works for avr, h8, rl78 and any other target which makes doubles the same
size as floats.
attr-complex-method.c should be doing the same thing.
After this change avr passes attr-complex-method.c and the rl78 and h8 ports
will pass both tests. Other targets in my tester are unaffected.
PR testsuite/114182
gcc/testsuite/
* gcc.c-torture/compile/attr-complex-method.c: Use
"large_double" to select between scan outputs.
* gcc.c-torture/compile/attr-complex-method-2.c: Similarly.
Jeff Law [Mon, 30 Dec 2024 20:51:55 +0000 (13:51 -0700)]
[RISC-V][PR target/106544] Avoid ICEs due to bogus asms
This is a fix for a bug Andrew P filed a while back where essentially a poorly
crafted asm statement could trigger a ICE during assembly output. Various
cases will use INTVAL (op) without verifying the operand is a CONST_INT node
first.
The usual way to handle this is via output_operand_lossage, which this patch
implements.
I focused primarily on the CONST_INT cases, there could well be other problems
in this space, if so they should get distinct bugs with testcases.
Tested in my tester on rv32 and rv64. Waiting for pre-commit testing before
moving forward.
PR target/106544
gcc/
* config/riscv/riscv.cc (riscv_print_operand): Issue an error for
invalid operands rather than invalidly accessing INTVAL of an
object that is not a CONST_INT. Fix one error string for 'N'.
gcc/testsuite
* gcc.target/riscv/pr106544.c: New test.
Steven G. Kargl [Sun, 29 Dec 2024 22:19:18 +0000 (14:19 -0800)]
Fortran: Implement f_c_string function.
Fortran 2023 has added the new intrinsic function F_C_STRING to
convert fortran strings of default character kind to a null
terminated C string.
Contributions from Steve Kargl, Harald Anlauf, FX Coudert, Mikael Morin,
and Jerry DeLisle.
PR fortran/117643
gcc/fortran/ChangeLog:
* check.cc (gfc_check_f_c_string): Check arguments of f_c_string().
* gfortran.h (enum gfc_isym_id): New symbol GFC_ISYM_F_C_STRING.
* intrinsic.cc (add_functions): Add the ISO C Binding routine f_c_string().
Wrap nearby long line to less than 80 characters.
* intrinsic.h (gfc_check_f_c_string): Prototype for gfc_check_f_c_string().
* iso-c-binding.def (NAMED_FUNCTION): Declare for ISO C Binding
routine f_c_string().
* primary.cc (gfc_match_rvalue): Fix comment that has been untrue since 2011.
Add ISOCBINDING_F_C_STRING to conditional.
* trans-intrinsic.cc (conv_trim): Specialized version of trim() for
f_c_string().
(gfc_conv_intrinsic_function): Use GFC_ISYM_F_C_STRING to trigger in-lining.
gcc/testsuite/ChangeLog:
* gfortran.dg/f_c_string1.f90: New test.
* gfortran.dg/f_c_string2.f90: New test.
Jeff Law [Mon, 30 Dec 2024 14:40:07 +0000 (07:40 -0700)]
[RISC-V][PR target/118122] Fix modes in recently added risc-v pattern
The new pattern to optimize certain code sequences on RISC-V played things a
bit fast and loose with modes -- some operands were using the ALLI iterator
while the scratch used X and the split codegen used X.
Naturally under the "right" circumstances this would trigger an ICE due to
mismatched modes. This patch uses X consistently in that pattern. It also
fixes some formatting nits.
Tested in my tester, but waiting on the pre-commit verdict before moving
forward.
PR target/118122
gcc/
* config/riscv/riscv.md (lui_constraint<X:mode>_and_to_or): Use
X iterator rather than ANYI consistently. Fix formatting.
This patch adds mf8 variants of what I'll loosely call the existing
"data movement" intrinsics, including the recent FEAT_LUT ones.
I think this completes the FP8 intrinsic definitions.
The new intrinsics are defined entirely in the compiler. This should
make it easy to move the existing non-mf8 variants into the compiler
as well, but that's too invasive for stage 3 and so is left to GCC 16.
I wondered about trying to reduce the cut-&-paste in the .def file,
but in the end decided against it. I have a plan for specifying this
information in a different format, but again that would need to wait
until GCC 16.
The patch includes some support for gimple folding. I initially
tested the patch without it, so that all the rtl expansion code
was exercised.
vlut.c fails for all types with big-endian ILP32, but that's
for a later patch.
gcc/
* config/aarch64/aarch64.md (UNSPEC_BSL, UNSPEC_COMBINE, UNSPEC_DUP)
(UNSPEC_DUP_LANE, UNSPEC_GET_LANE, UNSPEC_LD1_DUP, UNSPEC_LD1x2)
(UNSPEC_LD1x3, UNSPEC_LD1x4, UNSPEC_SET_LANE, UNSPEC_ST1_LANE)
(USNEPC_ST1x2, UNSPEC_ST1x3, UNSPEC_ST1x4, UNSPEC_VCREATE)
(UNSPEC_VEC_COPY): New unspecs.
* config/aarch64/iterators.md (UNSPEC_TBL): Likewise.
* config/aarch64/aarch64-simd-pragma-builtins.def: Add definitions
of the mf8 data movement intrinsics.
* config/aarch64/aarch64-protos.h
(aarch64_advsimd_vector_array_mode): Declare.
* config/aarch64/aarch64.cc
(aarch64_advsimd_vector_array_mode): Make public.
* config/aarch64/aarch64-builtins.h (qualifier_const_pointer): New
aarch64_type_qualifiers member.
* config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS)
(AARCH64_SIMD_VGET_HIGH_BUILTINS): Add mf8 variants.
(aarch64_int_or_fp_type): Handle qualifier_modal_float.
(aarch64_num_lanes): New function.
(binary_two_lanes, load, load_lane, store, store_lane): New signatures.
(unary_lane): Likewise.
(simd_type::nunits): New member function.
(simd_types): Add pointer types.
(aarch64_fntype): Handle the new signatures.
(require_immediate_lane_index): Use aarch64_num_lanes.
(aarch64_pragma_builtins_checker::check): Handle the new intrinsics.
(aarch64_convert_address): (aarch64_dereference_pointer):
(aarch64_canonicalize_lane, aarch64_convert_to_lane_mask)
(aarch64_pack_into_v128s, aarch64_expand_permute_pair)
(aarch64_expand_tbl_tbx): New functions.
(aarch64_expand_pragma_builtin): Handle the new intrinsics.
(aarch64_force_gimple_val, aarch64_copy_vops, aarch64_fold_to_val)
(aarch64_dereference, aarch64_get_lane_bit_index, aarch64_get_lane)
(aarch64_set_lane, aarch64_fold_combine, aarch64_fold_load)
(aarch64_fold_store, aarch64_ext_index, aarch64_rev_index)
(aarch64_trn_index, aarch64_uzp_index, aarch64_zip_index)
(aarch64_fold_permute): New functions, some split out from
aarch64_general_gimple_fold_builtin.
(aarch64_gimple_fold_pragma_builtin): New function.
(aarch64_general_gimple_fold_builtin): Use the new functions above.
* config/aarch64/aarch64-simd.md (aarch64_dup_lane<mode>)
(aarch64_dup_lane_<vswap_width_name><mode>): Add "@" to name.
(aarch64_simd_vec_set<mode>): Likewise.
(*aarch64_simd_vec_copy_lane_<vswap_width_name><mode>): Likewise.
(aarch64_simd_bsl<mode>): Likewise.
(aarch64_combine<mode>): Likewise.
(aarch64_cm<optab><mode><vczle><vczbe>): Likewise.
(aarch64_simd_ld2r<vstruct_elt>): Likewise.
(aarch64_vec_load_lanes<mode>_lane<vstruct_elt>): Likewise.
(aarch64_simd_ld3r<vstruct_elt>): Likewise.
(aarch64_simd_ld4r<vstruct_elt>): Likewise.
(aarch64_ld1x3<vstruct_elt>): Likewise.
(aarch64_ld1x4<vstruct_elt>): Likewise.
(aarch64_st1x2<vstruct_elt>): Likewise.
(aarch64_st1x3<vstruct_elt>): Likewise.
(aarch64_st1x4<vstruct_elt>): Likewise.
(aarch64_ld<nregs><vstruct_elt>): Likewise.
(aarch64_ld1<VALL_F16: Likewise.mode>): Likewise.
(aarch64_ld1x2<vstruct_elt>): Likewise.
(aarch64_ld<nregs>_lane<vstruct_elt>): Likewise.
(aarch64_<PERMUTE: Likewise.perm_insn><mode><vczle><vczbe>): Likewise.
(aarch64_ext<mode>): Likewise.
(aarch64_rev<REVERSE: Likewise.rev_op><mode><vczle><vczbe>): Likewise.
(aarch64_st<nregs><vstruct_elt>): Likewise.
(aarch64_st<nregs>_lane<vstruct_elt>): Likewise.
(aarch64_st1<VALL_F16: Likewise.mode>): Likewise.
This patch tries to regularise the definitions of the new pragma
simd types. Not all of the new types are currently used, but they
will be by later patches.
gcc/
* config/aarch64/aarch64-builtins.cc (simd_types): Use one macro
invocation for each element type.
In a later patch, I need to add "@" to a pattern that uses subst
attributes. This combination is problematic for two reasons:
(1) define_substs are applied and filtered at a later stage than the
handling of "@" patterns, so that the handling of "@" patterns
doesn't know which subst variants are valid and which will later be
dropped. Just adding a "@" therefore triggers a build error due to
references to non-existent patterns.
(2) Currently, the code will treat a single "@" pattern as contributing
to a single set of overloaded functions. These overloaded functions
will have an integer argument for every subst attribute. For example,
the vczle and vczbe in:
are subst attributes, and so currently we'd try to generate a
single set of overloads that take four arguments: one for rev_op,
one for the mode, one for vczle, and one for vczbe. The gen_*
and maybe_gen_* functions will also have one rtx argument for
each operand in the original pattern.
This model doesn't really make sense for define_substs, since
define_substs are allowed to add extra operands to an instruction.
The number of rtx operands to the generators would then be
incorrect.
I think a more sensible way of handling define_substs would be to
apply them first (and thus expand things like <vczle> and <vczbe>
above) and then apply "@". However, that's a relatively invasive
change and not suitable for stage 3.
This patch instead skips over subst attributes and restricts "@"
overload handling to the cases where no define_subst is applied.
I looked through all uses of "@" names in target code and there
seemed to be only one current use of "@" with define_substs,
in x86 vector code. The current behaviour seemed to be unwanted there,
and the x86 code was having to work around it.
gcc/
* read-rtl.cc (md_reader::handle_overloaded_name): Don't add
arguments for uses of subst attributes.
(apply_iterators): Only add instructions to an overloaded helper
if they use the default subst iterator values.
* doc/md.texi: Update documentation accordingly.
* config/i386/i386-expand.cc (expand_vec_perm_broadcast_1): Update
accordingly.
kelefth [Mon, 16 Dec 2024 13:36:59 +0000 (14:36 +0100)]
avoid-store-forwarding: fix reg init on load-eliminiation [PR117835]
During the initialization of the base register for the zero-offset
store, in the case that we are eliminating the load, we used a
paradoxical subreg assuming that we don't care about the higher bits
of the register. This led to writing wrong values when we were not
updating the whole register.
This patch fixes the issue by zero-extending the value stored in the
base register instead of using a paradoxical subreg.
* avoid-store-forwarding.cc
(store_forwarding_analyzer::process_store_forwarding):
Zero-extend the value stored in the base register instead of
using a paradoxical subreg.
MMIX: Correct handling of C23 (...) functions, PR117618
This commit fixes a MMIX C23 (...)-handling bug; failing
gcc.dg/c23-stdarg-[46789].c execution tests. But, this
isn't about a missing "|| arg.type != NULL_TREE" in the
PORT_setup_incoming_varargs function like most other
PR114175 port bugs exposed by the gcc.dg/c23-stdarg-6.c
.. -9.c tests; the MMIX port passes struct-return-values in
a register. But, the bug is somewhat similar.
This bug seems like it was added already in r13-3549-g4fe34cdcc80ac2, by incorrectly handling
TYPE_NO_NAMED_ARGS_STDARG_P-functions ((...)-functions);
counting them as having one parameter instead of none. That
"+ 1" below is a kind-of hidden function_arg_advance call,
which shouldn't happen for (...)-functions.
PR target/117618
* config/mmix/mmix.cc (mmix_setup_incoming_varargs):
Correct handling of C23 (...)-functions.
Lewis Hyatt [Thu, 26 Dec 2024 15:58:57 +0000 (10:58 -0500)]
tree-optimization: Fix ICE in tree-parloops.cc reduction_phi() [PR118205]
Prior to r15-6001, reduction_phi() could be called with the PHI parameter
not actually being a gphi*. The search through reduction_list would fail and
return NULL. r15-6001 added a requirement that PHI actually be a gphi*, but
did not add a check for this. The PR shows an example where the check is
needed; fix by explicitly returning NULL in this case.
gcc/ChangeLog:
PR tree-optimization/118205
* tree-parloops.cc (reduction_phi): Return NULL if PHI parameter is
not a phi node.
gcc/testsuite/ChangeLog:
PR tree-optimization/118205
* c-c++-common/pr118205.c: New test.
So for this bug we have what appears to me to just be a bogus pattern.
Essentially the pattern tries to detect cases where we have an SI mode value
and we can use the Zbs instructions to manipulate a bit. Conceptually that's
great.
The problem is the pattern assumes that SI objects are sign extended. It uses a
test to try and filter out a problematical case (subregs), but that simply
won't work with late-combine since the subreg will be stripped away and we have
no way of knowing if the SI value was already sign extended to 64 bits or not.
You might think we could look for a way to salvage the pattern and make it only
usable prior to register allocation. I pondered that extensively, but
ultimately concluded that with the introduction of ext-dce it wasn't safe.
So this just removes the problematical pattern. Thankfully there aren't any
regressions in the testsuite. Even the test designed to test this pattern's
applicability still generates the desired code.
Changes since v1:
- Adjust testcase so that it works for rv32 and rv64.
- Adjust PR number in subject line.
PR target/116715
gcc/
* config/riscv/bitmanip.md: Drop bogus pattern.
gcc/testsuite
* gcc.target/riscv/pr116715.c: New test.
Jeff Law [Sun, 29 Dec 2024 15:27:30 +0000 (08:27 -0700)]
[PR target/116720] Fix test for valid mempair operands
So this BZ is a case where we incorrectly indicated that the operand array was
suitable for the t-head load/store pair instructions.
In particular there's a test which checks alignment, but that happens *before*
we know if the operands are going to be reversed. So the routine reported the
operands are suitable.
At a later point the operands have been reversed into the proper order and we
realize the alignment test should have failed, resulting in the unrecognized
insn.
This fixes the code by moving the reversal check earlier and actually swapping
the local variables with the operands. That in turn allows for simpler testing
of alignments, ordering, etc.
I've tested this on rv32 and rv64 in my tester. I don't offhand know if the
patch from Filip that's been causing headaches for the RISC-V port has been
reverted/fixed. So there's a nonzero chance the pre-commit CI tester will
fail. I'll keep an eye on it and act appropriately.
PR target/116720
gcc/
* config/riscv/thead.cc (th_mempair_operands_p): Test for
aligned memory after swapping operands. Simplify test for
first memory access as well.
gcc/testsuite/
* gcc.target/riscv/pr116720.c: New test.
Gerald Pfeifer [Sun, 29 Dec 2024 13:44:50 +0000 (21:44 +0800)]
libstdc++: Delete leftover from Profile Mode removal
Commit 544be2beb1fa in 2019 remove Profile Mode and associated docs
including the XML version of profile_mode_diagnostics.html. Somehow
the latter survived until now. Simply delete it as well.
libstdc++-v3/testsuite/.../year_month_day/3.cc, 4.cc: Cut down for simulators
These two long-running tests happened to fail for me when
run in parallel (nprocs - 1) compared to a serial run, for
target mmix on my laptop. The runtime is 3m40s for 3.cc
before this change, and 0.9s afterwards.
* testsuite/std/time/year_month_day/3.cc (test01): Add ifdeffery to
limit the tested dates. For simulators, pass start and end dates
limiting the tested range to 100000 days, centered on days (0).
* testsuite/std/time/year_month_day/4.cc: Ditto.
Nathaniel Shead [Sat, 21 Dec 2024 12:42:28 +0000 (23:42 +1100)]
c++/modules: Fallback to ftruncate if posix_fallocate fails [PR115008]
Depending on the libc and filesystem, in cases where posix_fallocate
cannot do an efficient preallocation it may return EINVAL. In such a
case we should fall back to ftruncate instead.
Apparently, depending on the system the use of posix_fallocate can have
a noticeable speedup over ftruncate in general (depending on the system)
so it probably isn't worth it to use ftruncate in all cases.
PR c++/100358
PR c++/115008
gcc/cp/ChangeLog:
* module.cc (elf_out::create_mapping): Fallback to ftruncate if
posix_fallocate fails.
Nathaniel Shead [Sat, 21 Dec 2024 14:18:16 +0000 (01:18 +1100)]
c++: Don't treat lambda typedef as lambda declaration [PR106221]
I noticed that in a couple of places we sometimes treat any TYPE_DECL of
lambda type as defining a lambda, which isn't always true since C++20:
in `using T = decltype([]{})`, T is not a lambda-declaration.
PR c++/106221
PR c++/110680
gcc/cp/ChangeLog:
* pt.cc (check_default_tmpl_args): Check this is actually a
lambda declaration and not just a typedef.
(push_template_decl): Likewise.
Jakub Jelinek [Sat, 28 Dec 2024 14:42:56 +0000 (15:42 +0100)]
gimple-fold: Fix up fold_array_ctor_reference RAW_DATA_CST handling [PR118207]
The following testcases ICE because fold_array_ctor_reference in the
RAW_DATA_CST handling just return build_int_cst without actually checking
that if type is non-NULL, TREE_TYPE (val) is uselessly convertible to it.
By falling through the code after it without *suboff += we get everything
we need, the two if conditionals will never be true (we've already
checked that size == BITS_PER_UNIT and so can't be 0, and val will be
INTEGER_CST), but it will do the important fold_ctor_reference call
which will deal with type incompatibilities.
2024-12-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118207
* gimple-fold.cc (fold_array_ctor_reference): For RAW_DATA_CST,
just set val to build_int_cst and fall through to the normal
element handling code instead of returning build_int_cst right away.
Jiahao Xu [Wed, 18 Dec 2024 07:45:17 +0000 (15:45 +0800)]
LoongArch: Support immediate_operand for vec_cmp
We can't vectorize the code into instructions like vslti.w that compare
with immediate_operand, because we miss immediate_operand support for
integer comparisons.
Fix timevar.cc build on systems that don't have CLOCK_MONOTONIC
2024-12-26 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/118050
* timevar.cc (get_time): Only use CLOCK_MONOTONIC if
'_POSIX_TIMERS > 0 && defined(_POSIX_MONOTONIC_CLOCK)'.
Otherise, use CLOCK_REALTIME.
Alpha: Also use tree information to get base block alignment
We hardly ever emit code using machine instructions for aligned memory
accesses for block move and clear operation and the reason for this
appears to be that suboptimal alignment is often passed by the caller
and then we only try to find a better alignment by checking pseudo
register pointer alignment information, and from observation it's most
often only set for stack frame references.
This code originates from before Tree SSA days and we can do better
nowadays, by looking up the original tree node associated with a MEM
RTL, so implement this approach, factoring out repeating code from
`alpha_expand_block_move' and `alpha_expand_block_clear' to a new
function.
In some cases howewer tree information is not available while pointer
alignment is, such as with the case concerned with PR target/115459,
where we have:
showing no tree information and the alignment of 8 only for `orig_src',
while indeed REGNO_POINTER_ALIGN returns 128 for pseudo 65. So retain
the old approach and return the largest alignment determined and its
associated offset.
Add test cases accordingly and remove XFAILs from memclr-a2-o1-c9-ptr.c
now that it does get aligned code produced now.
gcc/
* config/alpha/alpha.cc
(alpha_get_mem_rtx_alignment_and_offset): New function.
(alpha_expand_block_move, alpha_expand_block_clear): Use it for
alignment retrieval.
gcc/testsuite/
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: Remove XFAILs.
* gcc.target/alpha/memcpy-di-aligned.c: New file.
* gcc.target/alpha/memcpy-di-unaligned.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-di-unaligned-src.c: New file.
Alpha: Fix offset adjustment in unaligned access helpers
Correct the offset adjustment made in the multi-word unaligned access
helpers such that it is actually used by the unaligned load and store
instructions, fixing a bug introduced with commit 1eb356b98df2 ("alpha
gprel optimizations")[1] back in 2001, which replaced address changes
made directly according to the argument of the MEM expression passed
with one made according to an address previously extracted from said MEM
expression. The address is however incorrectly extracted from said MEM
before an adjustment has been made to it for the offset supplied.
This bug is usually covered by the fact that our block move and clear
operations are hardly ever provided with correct block alignment data
and we also usually fail to fetch that information from the MEM supplied
(although PR target/115459 shows it does happen sometimes). Instead the
bit alignment of 8 is usually conservatively used, meaning that a zero
offset is passed to `alpha_expand_unaligned_store_words' and then code
has been written such that neither `alpha_expand_unaligned_load_words'
nor `alpha_expand_unaligned_store_words' cannot ever be called with
nonzero offset from `alpha_expand_block_move'.
The only situation where `alpha_expand_unaligned_store_words' can be
called with nonzero offset is from `alpha_expand_block_clear' with a BWX
target for a misaligned block that has been embedded in a data object of
a higher alignment such that there is a small unaligned prefix our code
decides to handle so as to align further stores.
For instance it happens when a block clear is called for a block of 9
bytes embedded at offset 1 in a structure aligned to a 2-byte word, as
illustrated by the test case included. Now this test case does not work
without the change that comes next applied, because the backend cannot
see the word alignment of the struct and uses the bit alignment of 8
instead.
Should this change be swapped with the next one incorrect code such as:
would be produced, where the unadjusted offsets of 1/8 can be seen with
the LDQ_U/STQ_U operations along with byte masks calculated accordingly
rather than the expected offsets of 2/9. As a result the byte at the
offset of 9 fails to get cleared. In these circumstances this would
also show as execution failures with the memclr.c test:
FAIL: gcc.c-torture/execute/memclr.c -O1 execution test
FAIL: gcc.c-torture/execute/memclr.c -Os execution test
-- not at `-O0' though, as the higher alignment cannot be retrieved in
that case, and then not at `-O2' or higher optimization levels either,
because then we choose to open-code this block clear instead:
ldbu $1,0($16)
stw $31,8($16)
stq $1,0($16)
avoiding the bug in `alpha_expand_unaligned_store_words'.
I am leaving the pattern match test case XFAIL-ed here for documentation
purposes and it will be un-XFAIL-ed along with the fix to retrieve the
correct alignment. The run test is of course never expected to fail.
gcc/
* config/alpha/alpha.cc (alpha_expand_unaligned_load_words):
Move address extraction until after the MEM referred has been
adjusted for the offset supplied.
(alpha_expand_unaligned_store_words): Likewise.
gcc/testsuite/
* gcc.target/alpha/memclr-a2-o1-c9-ptr.c: New file.
* gcc.target/alpha/memclr-a2-o1-c9-run.c: New file.
Alpha: Adjust MEM alignment for block clear [PR115459]
By inference it appears to me that the same fix for PR target/115459
needs to be applied to the block clear operation that has been done for
block move, as implemented by commit ccfe71518039 ("[alpha] adjust MEM
alignment for block move [PR115459]").
gcc/
PR target/115459
* config/alpha/alpha.cc (alpha_expand_block_clear): Adjust MEM
to match inferred alignment.
Alpha: Remove code duplication in block clear trailer
Remove code duplication in the part of `alpha_expand_block_clear' that
handles any aligned trailing part of the block, observing that the two
legs of code only differ by the machine mode and that we already take
the same approach with handling any unaligned prefix earlier on. No
functional change, just code shuffling.
gcc/
* config/alpha/alpha.cc (alpha_expand_block_clear): Fold two
legs of a conditional together.
Alpha: Permit constant zero source for "insvmisaligndi"
Eliminate a redundant bitwise inclusive OR operation on the insertion of
constant zero into a bit-field, improving code produced at `-O2' from an
output sequence such as:
for a quadword unaligned store operation. As shown in the example this
only triggers for the high-part store (and therefore only for 2-byte,
4-byte, and 8-byte stores), because `insXl' insns are fully expressed in
terms of RTL and therefore the insertion of zero is eliminated in later
RTL passes, however corresponding `insXh' insns are unspecs only, making
them impossible to see through.
We can get this optimal right from expand though, given that our handler
for "insvmisaligndi", i.e. `alpha_expand_unaligned_store', has explicit
provisions for `const0_rtx' source.
gcc/
* config/alpha/alpha.md (insvmisaligndi): Use "reg_or_0_operand"
rather than "register_operand" for operand 3.
gcc/testsuite/
* gcc.target/alpha/stlx0.c: New file.
* gcc.target/alpha/stqx0.c: New file.
* gcc.target/alpha/stwx0.c: New file.
* gcc.target/alpha/stwx0-bwx.c: New file.
testsuite: Expand coverage for unaligned memory stores
Expand coverage for unaligned memory stores, for the "insvmisalignM"
patterns, for 2-byte, 4-byte, and 8-byte scalars, across byte alignments
of 1, 2, 4 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test case), making
sure all data is written and no data is changed outside the area meant
to be written.
The test case has turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded this
test qualifies for generic verification and does not have to be limited
to the Alpha-specific subset of the testsuite.
gcc/testsuite/
* gcc.c-torture/execute/misalign.c: New file.
testsuite: Expand coverage for `__builtin_memset' with 0
Expand coverage for `__builtin_memset' for the special case of clearing
a block, primarily for "setmemM" block set pattern, though with smaller
sizes open-coded sequences may be produced instead.
This verifies block sizes in bytes from 1 to 64 across byte alignments
of 1, 2, 4, 8 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test case), making
sure all the intended area is cleared and no data is changed outside it.
These choice of the ranges for the parameters has come from the Alpha
backend, whose "setmemM" pattern has various corner cases related to
base alignment and the misalignment within.
The test case has turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded this
test qualifies for generic verification and does not have to be limited
to the Alpha-specific subset of the testsuite.
Just as with `__builtin_memcpy' tests this code turned out to require
quite a lot of time to compile, although a bit less than the former.
Example compilation times with reasonably fast POWER9@2.166GHz at `-O2'
optimization and GCC built at `-O2' for various targets:
Alpha/testsuite: Run target testing over all the usual optimization levels
Use `gcc-dg-runtest' test driver rather than `dg-runtest' to run the
Alpha testsuite as several targets already do. Add `-Og -g' and `-Oz'
as well via ADDITIONAL_TORTURE_OPTIONS to expand coverage. Adjust test
options across individual test cases accordingly where required.
Discard base-2.c, cix-2.c, and max-2.c test cases as they merely are
optimization variants of base-1.c, cix-1.c, and max-1.c respectively,
run at `-O2' rather than the default level (`-O0'), now covered by the
framework with the latter ones in a generic way.
The hook changes the allocno class to either FP_REGS or GR_REGS depending on
the mode of the register. This results in better register allocation overall,
fewer spills and reduced codesize - particularly in SPEC2017 lbm.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_ira_change_pseudo_allocno_class): New function.
(TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define macro.
Lewis Hyatt [Tue, 22 Oct 2024 19:23:40 +0000 (15:23 -0400)]
libcpp: Fix overly large buffer allocation
It seems that tokens_buff_new() has always been allocating the virtual
location buffer 4 times larger than intended, and now that location_t is
64-bit, it is 8 times larger. Fixed.
libcpp/ChangeLog:
* macro.cc (tokens_buff_new): Fix length argument to XNEWVEC.
testsuite/gcc.dg/memcmp-1.c: Cut down a factor of 7 for simulators
Running tests in parallel on my 4.5y+ old laptop made this
test time out: the test itself runs in 9m20s, the timeout
being 10 minutes with the 2x factor. That's a bit too close.
This commit does to the base test a similar change as was
done for gcc.dg/torture/inline-mem-cpy-1.c in commit r14-8188-g6eca0d23b7ea84; or IOW cut it down a factor of 7
(r14-8188 was by a factor of 11).
* gcc.dg/memcmp-1.c: Pass -DRUN_FRACTION=7 when testing in a simulator.
libgfortran: Fix build for targets with int32_t=long int
Without this, after r15-6415-g586477d67bf2e3, you'll see,
for targets where int32_t is a typedef of long int (beware
of artificially broken lines):
/x/gcc/libgfortran/caf/single.c: In function '_gfortran_caf_get_by_ct':
/x/gcc/libgfortran/caf/single.c:2943:56: error: passing argument 2 of '\
(accessor_hash_table + (sizetype)((unsigned int)getter_index * 12))->ac\
cessor' from incompatible pointer type [-Wincompatible-pointer-types]
2943 | accessor_hash_table[getter_index].accessor (dst_ptr, &free_bu\
ffer, src_ptr,
| ^~~~~~~~\
~~~~
| |
| int *
/x/gcc/libgfortran/caf/single.c:2943:56: note: expected 'int32_t *' {ak\
a 'long int *'} but argument is of type 'int *'
libgfortran:
* caf/single.c (_gfortran_caf_get_by_ct): Correct type of free_buffer
to int32_t.
Harald Anlauf [Mon, 23 Dec 2024 16:56:46 +0000 (17:56 +0100)]
Fortran: fix NULL without MOLD argument to scalar DT pointer dummy [PR118179]
Commit r15-6408 overlooked the case of passing NULL without MOLD argument
to a derived type pointer dummy argument without specified intent. Since
it is prohibited to modify the dummy argument, we treat it as if intent(in)
were specified and suppress copying back of the pointer address.
PR fortran/118179
gcc/fortran/ChangeLog:
* trans-expr.cc (conv_null_actual): Suppress copying back of
pointer address for unspecified intent.
gcc/testsuite/ChangeLog:
* gfortran.dg/null_actual_7.f90: Extend testcase to also cover
scalar variants with pointer or allocatable dummy with or without
specified intent.
Simon Martin [Mon, 23 Dec 2024 12:28:31 +0000 (13:28 +0100)]
libcc1: Fix tags generation target
'make tags' currently fails for libcc1 with this:
*** No rule to make target `marshall-c.hh', needed by `tags-am'. Stop.
The problem is that while marshall-c.hh has been removed via r12-454-g25d1a6ecdc443f, it's still part of the libcc1_la_SOURCES
variable, hence the 'tags' target has a dependency on it.
This patch simply removes the marshall_c_source variable, that should be
empty.
libcc1/ChangeLog:
* Makefile.am: Remove reference to deleted marshall-c.h.
* Makefile.in: Regenerate.
Paul Thomas [Mon, 23 Dec 2024 15:32:40 +0000 (15:32 +0000)]
Fortran: Bugs found in class_transformational_1/2.f90[PR116254/118059].
2024-12-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran/ChangeLog
PR fortran/116254
* trans-array.cc (gfc_trans_create_temp_array): Make sure that
transformational intrinsics of class objects that change rank,
most particularly spread, go through the correct code path. Re-
factor so that changes to the dtype are done on the temporary
before the class data of the result points to it.
PR fortran/118059
* trans-expr.cc (arrayfunc_assign_needs_temporary): Character
array function expressions assigned to an unlimited polymorphic
variable require a temporary.
Recently two test cases for PR118149 have been added.
While pr118149-2.c works well for AArch64, pr118149.c fails
because the expected optimization in forwprop4 cannot be applied
as SLP vectorization does not happen.
This patch fixes this issue by disabling the check on AArch64.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr118149.c: Disable for AArch64.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Wilken Gottwalt [Sun, 8 Dec 2024 19:46:16 +0000 (19:46 +0000)]
gm2: fix bad programming practice warning
Fix identifier names to be too similar to Modula-2 keywords and causing
warnings coming from Modula-2's own libraries.
m2/m2log/InOut.mod:51:18: note: In implementation module ‘InOut’:
either the identifier has the same name as a keyword or alternatively a
keyword has the wrong case (‘IN’ and ‘in’)
51 | in, out: File ;
m2/m2log/InOut.mod:51:18: note: the symbol name ‘in’ is legal as an
identifier, however as such it might cause confusion and is considered
bad programming practice
gcc/gm2:
* gm2-libs-log/InOut.mod: Fix bad identifier warning.
testsuite: arm: Check for short circuit instructions [PR103298]
Instead of checking that a certain transformation is not used by
counting the number of return instructions and the number of BEQ
instructions, check that none of CMP, MOV, ORR and AND instructions are
suffixed with EQ or NE.
Also removed size check as it's very unstable (depends on optimization
in use).
gcc/testsuite/ChangeLog:
PR testsuite/103298
* gcc.target/arm/pr43920-2.c: Change to assembler pattern
"(cmp|mov|orr|and)(eq|ne)" for the check. Remove size check.
Fortran: Replace getting of coarray data with accessor-based version. [PR107635]
Getting coarray data from remote images was slow, inefficient and did
not work for object files that where not compiled with coarray support
for derived types with allocatable/pointer components. The old approach
emulated accessing data through a whole structure ref, which was error
prone for corner cases. Furthermore was did it have a runtime
complexity of O(N), where N is the number of allocatable/pointer
components and descriptors involved. Each of those needed communication
twice. The new approach creates a routine for each access into a
coarray object putting all required operations there. Looking a
tree-dump one will see those small routines. But this time it is just
compiled fortran with all the knowledge of the compiler of bounds and so
on. New paradigms will be available out of the box. Furthermore is the
complexity of the communication reduced to be O(1). E.g. the mpi
implementation sends one message for the parameters of the access and
one message back with the results without caring about the number of
allocatable/pointer/descriptor components in the access.
Identification of access routines is done be adding them to a hash map,
where the hash is the same on all images. Translating the hash to an
index, which is the same on all images again, allows for fast calls of
the access routines. Resolving the hash to an index is cached at
runtime, preventing additional hash map lookups. A hashmap was use
because not all processor OS combinations may use the same address for
the access routine.
gcc/fortran/ChangeLog:
PR fortran/107635
* gfortran.h (gfc_add_caf_accessor): New function.
* gfortran.texi: Document new API routines.
* resolve.cc (get_arrayspec_from_expr): Synthesize the arrayspec
resulting from an expression, i.e. not only the rank, but also
the bounds.
(remove_coarray_from_derived_type): Remove coarray ref from a
derived type to access it in access routine.
(convert_coarray_class_to_derived_type): Same but for classes.
The result is a derived type.
(split_expr_at_caf_ref): Split an expression at the coarray
reference to move the reference after the coarray ref into the
access routine.
(check_add_new_component): Helper to add variables as
components to derived type transfered to the access routine.
(create_get_parameter_type): Create the derived type to transfer
addressing data to the access routine.
(create_get_callback): Create the access routine.
(add_caf_get_intrinsic): Use access routine instead of old
caf_get.
* trans-decl.cc (gfc_build_builtin_function_decls): Register new
API routines.
(gfc_create_module_variable): Use renamed flag.
(gfc_emit_parameter_debug_info):
(struct caf_accessor): Linked list of hash-access routine pairs.
(gfc_add_caf_accessor): Add a hash-access routine pair to above
linked list.
(create_caf_accessor_register): Add all registered hash-access
routine pairs to the current caf_init.
(generate_coarray_init): Use routine above.
(gfc_generate_module_vars): Use renamed flag.
(generate_local_decl): Same.
(gfc_generate_function_code): Same.
(gfc_process_block_locals): Same.
* trans-intrinsic.cc (conv_shape_to_cst): Build the product of a
shape.
(gfc_conv_intrinsic_caf_get): Create call to access routine.
(conv_caf_send): Adapt to caf_get using less arguments.
(gfc_conv_intrinsic_function): Same.
* trans.cc (gfc_trans_force_lval): Helper to ensure that an
expression can be used as an lvalue-ref.
* trans.h (gfc_trans_force_lval): See above.
libgfortran/ChangeLog:
* caf/libcaf.h (_gfortran_caf_register_accessor): New function
to register access routines at runtime.
(_gfortran_caf_register_accessors_finish): New function to
finish registration of access routine and sort hash map.
(_gfortran_caf_get_remote_function_index): New function to
convert an hash to an index.
(_gfortran_caf_get_by_ct): New function to get data from a
remote image using the access routine given by an index.
* caf/single.c (struct accessor_hash_t): Hashmap type.
(_gfortran_caf_send): Fixed formatting.
(_gfortran_caf_register_accessor): Register a hash accessor
routine.
(hash_compare): Compare two hashes for sort() and bsearch().
(_gfortran_caf_register_accessors_finish): Sort the hashmap to
allow bsearch()'s quick lookup.
(_gfortran_caf_get_remote_function_index): Map a hash to an
index.
(_gfortran_caf_get_by_ct): Get data from a remote image using
the index provided by get_remote_function_index().
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray_atomic_5.f90: Adapted to look for
get_by_ct.
* gfortran.dg/coarray_lib_comm_1.f90: Same.
* gfortran.dg/coarray_stat_function.f90: Same.
Fortran: Remove adding and removing of caf_get. [PR107635]
Preparatory work for PR107635.
During resolve prevent adding caf_get calls for expressions on the
left-hand-side of an assignment and removing them later on again.
Furthermore has the caf_token in a component become a pointer to
the component and not the backend_decl of the caf-component.
In some cases the caf_token was added as last component in a derived
type and not as the next one following the component that it was
needed to be associated to.
gcc/fortran/ChangeLog:
PR fortran/107635
* gfortran.h (gfc_comp_caf_token): Convenient macro for
accessing caf_token's tree.
* resolve.cc (gfc_resolve_ref): Backup caf_lhs when resolving
expr in array_ref.
(remove_caf_get_intrinsic): Removed.
(resolve_variable): Set flag caf_lhs when resolving lhs of
assignment to prevent insertion of caf_get.
(resolve_lock_unlock_event): Same, but the lhs is the parameter.
(resolve_ordinary_assign): Move conversion to caf_send to
resolve_codes.
(resolve_codes): Adress caf_get and caf_send here.
(resolve_fl_derived0): Set component's caf_token when token is
necessary.
* trans-array.cc (gfc_conv_array_parameter): Get a coarray for
expression that have a corank.
(structure_alloc_comps): Use macro to get caf_token's tree.
(gfc_alloc_allocatable_for_assignment): Same.
* trans-expr.cc (gfc_get_ultimate_alloc_ptr_comps_caf_token):
Same.
(gfc_trans_structure_assign): Same.
* trans-intrinsic.cc (conv_expr_ref_to_caf_ref): Same.
(has_ref_after_cafref): New function to figure that after a
reference of a coarray another reference is present.
(conv_caf_send): Get rhs from correct place, when caf_get is
not removed.
* trans-types.cc (gfc_get_derived_type): Get caf_token from
component and no longer guessing.
Arsen Arsenović [Thu, 1 Aug 2024 15:38:15 +0000 (17:38 +0200)]
warn-access: ignore template parameters when matching operator new/delete [PR109224]
Template parameters on a member operator new cannot affect its member
status nor whether it is a singleton or array operator new, hence, we
can ignore it for purposes of matching. Similar logic applies to the
placement operator delete.
In the PR (and a lot of idiomatic coroutine code generally), operator
new is templated in order to be able to inspect (some of) the arguments
passed to the coroutine, to make allocation-related decisions. However,
the coroutine implementation will not call a placement delete form, so
it cannot get templated. As a result, when demangling, we have an extra
template DEMANGLE_COMPONENT_TEMPLATE around the actual operator new, but
not operator delete. This terminates new_delete_mismatch_p early.
PR middle-end/109224 - Wmismatched-new-delete false positive with a templated operator new (common with coroutines)
gcc/ChangeLog:
PR middle-end/109224
* gimple-ssa-warn-access.cc (new_delete_mismatch_p): Strip
DEMANGLE_COMPONENT_TEMPLATE from the operator new and operator
after demangling.
gcc/testsuite/ChangeLog:
PR middle-end/109224
* g++.dg/warn/Wmismatched-new-delete-9.C: New test.
Harald Anlauf [Sat, 14 Dec 2024 19:26:47 +0000 (20:26 +0100)]
Fortran: fix passing of NULL() to assumed-rank, derived type dummy [PR104819]
PR fortran/104819
gcc/fortran/ChangeLog:
* interface.cc (compare_parameter): For the rank check, NULL()
inherits the rank of a provided MOLD argument.
(gfc_compare_actual_formal): Adjust check of NULL() actual argument
against formal to accept F2008 enhancements (allocatable dummy).
NULL() with MOLD argument retains a pointer/allocatable attribute.
* trans-expr.cc (conv_null_actual): Implement passing NULL() to
derived-type dummy with pointer/allocatable attribute, and ensure
that the actual rank is passed to an assumed-rank dummy.
(gfc_conv_procedure_call): Use it.
Jeff Law [Sat, 21 Dec 2024 15:33:36 +0000 (08:33 -0700)]
[RISC-V][PR middle-end/118084] Fix brev based reflection code
The fuzzer tripped over a risc-v target issue in the expansion of CRCs.
In particular we want to use brev instruction to improve the reflection
code.
In the case where the item to be reflected is smaller than a word we
would end up triggering an ICE due to mode mismatching since the
expansion code asks for the operation in word_mode.
I was briefly confused by the multiple calls into this code, but we have
to reflect multiple values and those calls may be reflecting different
sized items. So seeing one in SI, then another in QI is sensible.
The fix is pretty simple. In theory the item being reflected should
always be word_size or smaller. So an assertion is added to verify
that. If the item's size is smaller than a word, we can use a
paradoxical subreg. The logical right shift after the brev should zero
out any extraneous bits.
It's unclear why we're passing a pointer to an RTX in this code. I left
that as-is, but we can simplify the code a little bit by doing the
dereference early and using the dereferenced value.
Pan Li [Tue, 10 Dec 2024 06:27:53 +0000 (14:27 +0800)]
Match: Refactor the signed SAT_ADD match patterns [NFC]
This patch would like to refactor the all signed SAT_ADD patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of signed SAT_ADD match patterns.
Signed-off-by: Pan Li <pan2.li@intel.com> Signed-off-by: Pan Li <pan2.li@intel.com>
Mark Harmstone [Fri, 20 Dec 2024 02:29:21 +0000 (02:29 +0000)]
Fix compilation error in vmsdbgout_begin_block on VMS targets
Commit 4ed189854eae ("Add block parameter to begin_block debug hook") changed
the definition of the begin_block function pointer to add another parameter,
but I missed a call in vmsdbgout_begin_block.
Sandra Loosemore [Fri, 20 Dec 2024 16:27:14 +0000 (16:27 +0000)]
Fortran: Fix hyphenation errors in the manual
When looking through the gfortran manual, I noted some problems with
hyphens being used where they're not correct or necessary,
e.g. "non-standard" vs "nonstandard", "null-pointer" vs "null pointer"
(as a noun), etc. I've made a pass through the documentation to
correct at least some of those uses.
gcc/fortran/ChangeLog
* gfortran.texi: Get rid of some unnecessary hyphens throughout
the file.
* invoke.texi: Likewise.
Sandra Loosemore [Fri, 20 Dec 2024 05:08:15 +0000 (05:08 +0000)]
Fortran: Use the present tense for the manual.
The present tense is preferred for expressing facts or enduring
behavior. Thus we should say "option X does Y" instead of "option X
will do Y", reserving the future tense for things that happen at some
later time (such as in a future release of GCC, or at run time as
explicitly contrasted with compile time).
This set of edits is largely mechanical substitution of phrasing
involving "will". I also fixed a few more markup problems noted while
editing nearby text and fixed a few instances of awkward wording.
gcc/fortran/ChangeLog
* gfortran.texi: Use the present tense throughout; fix some
markup issues and awkward wording.
* invoke.texi: Likewise.
Sandra Loosemore [Thu, 19 Dec 2024 23:49:57 +0000 (23:49 +0000)]
Fortran: Fixes for markup, typos, and indexing in manual
While working on something else I noticed there were numerous places
in the GNU Fortran manual with incorrect/missing Texinfo markup. I
made a pass through about the first third of the manual (not yet the
coarray API or intrinsics documentation) to fix at least some of those
issues, plus some typos and missing @cindex entries.
There shouldn't be any semantic changes to the documentation in this patch.
gcc/fortran/ChangeLog
* gfortran.texi: Fix markup, typos, and indexing throughout the
file.
* invoke.texi: Likewise.
Sandra Loosemore [Thu, 19 Dec 2024 00:43:11 +0000 (00:43 +0000)]
Fortran: Clean up -funderscoring and -fsecond-underscore docs [PR51820]
This is a long-standing documentation bug in the Fortran manual,
initially reported in 2012 as PR51820, with a quick fix applied later
for PR109216. The patch here incorporates more of the discussion from
the original issue.
gcc/fortran/ChangeLog
PR fortran/51820
PR fortran/89632
PR fortran/109216
* invoke.texi (Code Gen Options): Further cleanups of the discussion
of what -funderscoring and -fsecond-underscore do.
Alexandre Oliva [Fri, 20 Dec 2024 21:02:01 +0000 (18:02 -0300)]
strub: accept indirection of volatile pointer types [PR118007]
We don't want to indirect pointers in strub wrappers, because it
generally isn't profitable, but if the argument is volatile, then we
must use indirection to preserve access patterns, so amend the
assertion check.
Alexandre Oliva [Fri, 20 Dec 2024 21:01:53 +0000 (18:01 -0300)]
avoid trying to set block in barriers [PR113506]
When we emit a sequence before a preexisting insn and naming a BB to
store in the insns, we will attempt to store the BB even in barriers
present in the sequence.
Barriers don't expect blocks, and rtl checking catches the problem.
When emitting after a preexisting insn, we skip the block setting in
barriers. Change the before emitter to do so as well.
for gcc/ChangeLog
PR middle-end/113506
* emit-rtl.cc (add_insn_before): Don't set the block of a
barrier.