Jakub Jelinek [Thu, 24 Apr 2025 21:44:28 +0000 (23:44 +0200)]
s390: Allow 5+ argument tail-calls in some special cases [PR119873]
protobuf (and therefore firefox too) currently doesn't build on s390*-linux.
The problem is that it uses [[clang::musttail]] attribute heavily, and in
llvm (IMHO llvm bug) [[clang::musttail]] calls with 5+ arguments on
s390*-linux are silently accepted and result in a normal non-tail call.
In GCC we just reject those because the target hook refuses to tail call it
(IMHO the right behavior).
Now, the reason why that happens is as s390_function_ok_for_sibcall attempts
to explain, the 5th argument (assuming normal <= wordsize integer or pointer
arguments, nothing that needs 2+ registers) is passed in %r6 which is not
call clobbered, so we can't do tail call when we'd have to change content
of that register and then caller would assume %r6 content didn't change and
use it again.
In the protobuf case though, the 5th argument is always passed through
from the caller to the musttail callee unmodified, so one can actually
emit just jg tail_called_function or perhaps tweak some registers but
keep %r6 untouched, and in that case I think it is just fine to tail call
it (at least unless the stack slots used for 6+ argument can't be modified
by the callee in the ABI and nothing checks for that).
So, the following patch checks for this special case, where the argument
which uses %r6 is passed in a single register and it is passed default
definition of SSA_NAME of a PARM_DECL with the same DECL_INCOMING_RTL.
It won't really work at -O0 but should work for -O1 and above, at least when
one doesn't really try to modify the parameter conditionally and hope it will
be optimized away in the end.
2025-04-24 Jakub Jelinek <jakub@redhat.com>
Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
PR target/119873
* config/s390/s390.cc (s390_call_saved_register_used): Don't return
true if default definition of PARM_DECL SSA_NAME of the same register
is passed in call saved register.
(s390_function_ok_for_sibcall): Adjust comment.
* gcc.target/s390/pr119873-1.c: New test.
* gcc.target/s390/pr119873-2.c: New test.
* gcc.target/s390/pr119873-3.c: New test.
* gcc.target/s390/pr119873-4.c: New test.
Gaius Mulley [Thu, 24 Apr 2025 21:09:19 +0000 (22:09 +0100)]
PR modula2/115276: libgm2 wraptime.cc field access all return -1.
This patch provides autoconf tests for each field used in wraptime.cc
referencing struct tm and struct timeval.
libgm2/ChangeLog:
PR modula2/115276
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac (AC_STRUCT_TIMEZONE): Add.
(AC_CHECK_MEMBER): Test for struct tm.tm_year.
(AC_CHECK_MEMBER): Test for struct tm.tm_mon.
(AC_CHECK_MEMBER): Test for struct tm.tm_mday.
(AC_CHECK_MEMBER): Test for struct tm.tm_hour.
(AC_CHECK_MEMBER): Test for struct tm.tm_min.
(AC_CHECK_MEMBER): Test for struct tm.tm_sec.
(AC_CHECK_MEMBER): Test for struct tm.tm_year.
(AC_CHECK_MEMBER): Test for struct tm.tm_yday.
(AC_CHECK_MEMBER): Test for struct tm.tm_wday.
(AC_CHECK_MEMBER): Test for struct tm.tm_isdst.
(AC_CHECK_MEMBER): Test for struct timeval.tv_sec.
(AC_CHECK_MEMBER): Test for struct timeval.tv_sec.
(AC_CHECK_MEMBER): Test for struct timeval.tv_usec.
* libm2iso/wraptime.cc (InitTimeval): Guard against lack
struct timeval and malloc.
(InitTimezone): Guard against lack of struct tm.tm_zone
and malloc.
(KillTimezone): Ditto.
(InitTimeval): Guard against lack of struct timeval
and malloc.
(KillTimeval): Guard against lack of malloc.
(settimeofday): Guard against lack of struct tm.tm_zone.
(GetFractions): Guard against lack of struct timeval.
(localtime_r): Ditto.
(GetYear): Guard against lack of struct tm.
(GetMonth): Ditto.
(GetDay): Ditto.
(GetHour): Ditto.
(GetMinute): Ditto.
(GetSecond): Ditto.
(GetSummerTime): Ditto.
(GetDST): Guards against lack of struct timezone.
(SetTimezone): Ditto.
(SetTimeval): Guard against lack of struct tm.
Robert Dubner [Thu, 24 Apr 2025 20:26:58 +0000 (16:26 -0400)]
cobol: Repair some exception processing logic.
This patch changes the exception processing logic for the calculation of
reference modifications and table subscripts to be more in accordance with
ISO specifications.
It also adjusts the processing of RETURN-CODE when calling routines that
have no CALL ... RETURNING phrase.
libgomp/testsuite: Fix hip_header_nvidia check, add workaround to test
This is all about using the AMD's HIP header files with
__HIP_PLATFORM_NVIDIA__ defined, i.e. HIP with Nvidia/CUDA; in that case,
HIP is a thin layer on top of CUDA.
First, the check_effective_target_gomp_hip_header_nvidia check failed;
to fix it, -Wno-deprecated-declarations was added - and likewise to the
two affected testcases that actually used the HIP headers on Nvidia.
Doing so, the HIP tested was successful but the HIP-BLAS one showed two
issues:
* One seems to be related to include search paths as the HIP header uses
#include "library_types.h" to include that CUDA header. Seemingly, it
tried to included (again) the HIP header hip/library_types.h, not the
CUDA one. I guess, some tweaking of -isystem vs. -I could have
prevented this, but the simpler workaround was to just explicitly
include the CUDA one before the HIP header files.
* Once done, everything compiled but linking failed as the association
between three HIP-BLAS functions and their CUDA-BLAS ones did not
work. Solution: Just add three #define for mapping them.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp
(check_effective_target_gomp_hip_header_nvidia): Compile with
"-Wno-deprecated-declarations".
* testsuite/libgomp.c/interop-hip-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: Likewise.
* testsuite/libgomp.c/interop-hipblas.h: Add workarounds
when using the HIP headers with __HIP_PLATFORM_NVIDIA__.
François Dumont [Thu, 10 Apr 2025 18:58:11 +0000 (20:58 +0200)]
libstdc++: Add std::deque<>::shrink_to_fit test
The existing test is currently testing std::vector. Adapt it for std::deque.
libstdc++-v3/ChangeLog:
* testsuite/util/replacement_memory_operators.h: Adapt for -fno-exceptions
context.
* testsuite/23_containers/deque/capacity/shrink_to_fit.cc: Adapt test
to check std::deque shrink_to_fit method.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Reviewed-by: Tomasz Kaminski <tkaminsk@redhat.com>
aarch64: Fix CFA offsets in non-initial stack probes [PR119610]
PR119610 is about incorrect CFI output for a stack probe when that
probe is not the initial allocation. The main aarch64 stack probe
function, aarch64_allocate_and_probe_stack_space, implicitly assumed
that the incoming stack pointer pointed to the top of the frame,
and thus held the CFA.
aarch64_save_callee_saves and aarch64_restore_callee_saves use a
parameter called bytes_below_sp to track how far the stack pointer
is above the base of the static frame. This patch does the same
thing for aarch64_allocate_and_probe_stack_space.
Also, I noticed that the SVE path was attaching the first CFA note
to the wrong instruction: it was attaching the note to the calculation
of the stack size, rather than to the r11<-sp copy.
gcc/
PR target/119610
* config/aarch64/aarch64.cc (aarch64_allocate_and_probe_stack_space):
Add a bytes_below_sp parameter and use it to calculate the CFA
offsets. Attach the first SVE CFA note to the move into the
associated temporary register.
(aarch64_allocate_and_probe_stack_space): Update calls accordingly.
Start out with bytes_per_sp set to the frame size and decrement
it after each allocation.
gcc/testsuite/
PR target/119610
* g++.dg/torture/pr119610.C: New test.
* g++.target/aarch64/sve/pr119610-sve.C: Likewise.
Jakub Jelinek [Thu, 24 Apr 2025 13:29:50 +0000 (15:29 +0200)]
c: Allow $@` in GNU23/GNU2Y raw string delimiters [PR110343]
Aaron mentioned in the PR that late in C23 N3124 was adopted and
$@` are now part of basic character set. The paper has been implemented
in GCC from what I can see, but we should allow for GNU23/2Y $@` in
raw string delimiters as well, like they are allowed for C++26, because
the delimiters can contain anything from basic character set but space,
()\, tab, form-feed, newline and backspace.
2025-04-24 Jakub Jelinek <jakub@redhat.com>
PR c++/110343
* lex.cc (lex_raw_string): For C allow $@` in raw string delimiters
if CPP_OPTION (pfile, low_ucns) i.e. for C23 and later.
Add checks for nowait/depend and for checks that the returned
CUDA, CUDA_DRIVER and HIP interop objects actually work.
While the CUDA/CUDA_DRIVER ones are only for Nvidia GPUs, HIP
works on both AMD and Nvidia GPUs; on Nvidia GPUs, it is a
very thin wrapper around CUDA.
For Fortran, only a HIP test has been added - using hipfort.
While libgomp.c-c++-common/interop-2.c always works - even without
GPU - and checks for depend / nowait, all others require that
runtime libraries are found at link (and execution) time:
For Nvidia GPUs, libcuda + libcudart or libcublas,
For AMD GPUs, libamdhip64 or libhipblas.
The header files and hipfort modules do not need to be present as a
fallback has been implemented, but if they are, they get used.
Due to the combinations, the basic 1x C/C++, 4x C and 1x Fortran tests
yield 1x C/C++, 14x C and 4 Fortran run-test files.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp (check_effective_target_openacc_cublas,
check_effective_target_openacc_cudart): Update description as
the check requires more.
(check_effective_target_openacc_libcuda,
check_effective_target_openacc_libcublas,
check_effective_target_openacc_libcudart,
check_effective_target_gomp_hip_header_amd,
check_effective_target_gomp_hip_header_nvidia,
check_effective_target_gomp_hipfort_module,
check_effective_target_gomp_libamdhip64,
check_effective_target_gomp_libhipblas): New.
* testsuite/libgomp.c-c++-common/interop-2.c: New test.
* testsuite/libgomp.c/interop-cublas-full.c: New test.
* testsuite/libgomp.c/interop-cublas-libonly.c: New test.
* testsuite/libgomp.c/interop-cuda-full.c: New test.
* testsuite/libgomp.c/interop-cuda-libonly.c: New test.
* testsuite/libgomp.c/interop-hip-amd-full.c: New test.
* testsuite/libgomp.c/interop-hip-amd-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-full.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-no-headers.c: New test.
* testsuite/libgomp.c/interop-hip-nvidia-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hip.h: New test.
* testsuite/libgomp.c/interop-hipblas-amd-full.c: New test.
* testsuite/libgomp.c/interop-hipblas-amd-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-full.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-no-headers.c: New test.
* testsuite/libgomp.c/interop-hipblas-nvidia-no-hip-header.c: New test.
* testsuite/libgomp.c/interop-hipblas.h: New test.
* testsuite/libgomp.fortran/interop-hip-amd-full.F90: New test.
* testsuite/libgomp.fortran/interop-hip-amd-no-module.F90: New test.
* testsuite/libgomp.fortran/interop-hip-nvidia-full.F90: New test.
* testsuite/libgomp.fortran/interop-hip-nvidia-no-module.F90: New test.
* testsuite/libgomp.fortran/interop-hip.h: New test.
opts.cc Simplify handling of explicit -flto-partition= and -fipa-reorder-for-locality
The handling of an explicit -flto-partition= and -fipa-reorder-for-locality
should be simpler. No need to have a new default option. We can use opts_set
to check if -flto-partition is explicitly set and use that information in the
error handling.
Remove -flto-partition=default and update accordingly.
Bootstrapped and tested on aarch64-none-linux-gnu.
Gaius Mulley [Thu, 24 Apr 2025 10:15:18 +0000 (11:15 +0100)]
PR modula2/119915: Sprintf1 repeats the entire format string if it starts with a directive
This bugfix is for FormatStrings to ensure that in the case of %x, %u the
procedure function PerformFormatString uses Copy rather than Slice to
avoid the case on an upper bound of zero in Slice. Oddly the %d case
had the correct code.
gcc/m2/ChangeLog:
PR modula2/119915
* gm2-libs/FormatStrings.mod (PerformFormatString): Handle
the %u and %x format specifiers in a similar way to the %d
specifier. Avoid using Slice and use Copy instead.
gcc/testsuite/ChangeLog:
PR modula2/119915
* gm2/pimlib/run/pass/format2.mod: New test.
/* size: 80, cachelines: 2, members: 7 */
/* sum members: 76 */
/* sum bitfield members: 10 bits, bit holes: 1, sum bit holes: 22 bits */
/* last cacheline: 16 bytes */
};
struct dw_attr_struct {
enum dwarf_attribute dw_attr; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
struct dw_val_node dw_attr_val; /* 8 32 */
/* size: 40, cachelines: 1, members: 2 */
/* sum members: 36, holes: 1, sum holes: 4 */
/* last cacheline: 40 bytes */
};
The following patch is an (not very clean admittedly) attempt to decrease
size of dw_loc_descr_node from 80 bytes to 72 and (more importantly)
dw_attr_struct from 40 bytes to 32 by moving the dw_attr member from
dw_attr_struct into dw_attr_val's padding and similarly move
dw_loc_opc/dtprel/frame_offset_rel members into dw_loc_oprnd1 padding
and dw_loc_addr into dw_loc_oprnd2 padding.
All we need to ensure is that nothing tries to copy whole dw_val_node
structs unless it is copied as part of whole dw_loc_descr_node or
dw_attr_struct copy.
To verify that wasn't the case, I've temporarily added a deleted copy ctor
to dw_val_node and then looked at all the errors/warnings caused by that,
and those were just from memcpy/memmove or structure assignments of whole
dw_loc_descr_node/dw_attr_struct.
2025-04-24 Jakub Jelinek <jakub@redhat.com>
PR debug/119711
* dwarf2out.h (struct dw_val_node): Add u member.
(struct dw_loc_descr_node): Remove dw_loc_opc, dtprel,
frame_offset_rel and dw_loc_addr members.
(dw_loc_opc, dw_loc_dtprel, dw_loc_frame_offset_rel, dw_loc_addr):
Define.
(struct dw_attr_struct): Remove dw_attr member.
(dw_attr): Define.
* dwarf2out.cc (loc_descr_equal_p_1): Use dw_loc_dtprel instead of
dtprel.
(output_loc_operands, new_addr_loc_descr, loc_checksum,
loc_checksum_ordered): Likewise.
(resolve_args_picking_1): Use dw_loc_frame_offset_rel instead of
frame_offset_rel.
(loc_list_from_tree_1): Likewise.
(resolve_addr_in_expr): Use dw_loc_dtprel instead of dtprel.
(copy_deref_exprloc): Copy val_class, val_entry and v members
instead of whole dw_loc_oprnd1 and dw_loc_oprnd2.
(optimize_string_length): Copy val_class, val_entry and v members
instead of whole dw_attr_val.
(hash_loc_operands): Use dw_loc_dtprel instead of dtprel.
(compare_loc_operands, compare_locs): Likewise.
Gaius Mulley [Thu, 24 Apr 2025 01:39:36 +0000 (02:39 +0100)]
PR modula2/119914 No error message generated when passing a Ztype to an unbounded array
This patch detects constants ZType, RType, CType being passed to unbounded
arrays and generates an error message highlighting the formal and
actual parameters in error.
gcc/m2/ChangeLog:
PR modula2/119914
* gm2-compiler/M2Check.mod (checkConstMeta): Add check for
Ztype, Rtype and Ctype and unbounded arrays.
(IsZRCType): New procedure function.
(isZRC): Add comment.
* gm2-compiler/M2Quads.mod:
* gm2-compiler/M2Range.mod (gdbinit): New procedure.
(BreakWhenRangeCreated): Ditto.
(CheckBreak): Ditto.
(InitRange): Call CheckBreak.
(Init): Add gdbhook and initialize interactive watch point.
* gm2-compiler/SymbolTable.def (GetNthParamAnyClosest): New
procedure function.
* gm2-compiler/SymbolTable.mod (BreakSym): Remove constant.
(BreakSym): Add Variable.
(stop): Remove.
(gdbhook): New procedure.
(BreakWhenSymCreated): Ditto.
(CheckBreak): Ditto.
(NewSym): Call CheckBreak.
(Init): Add gdbhook and initialize interactive watch point.
(MakeProcedure): Replace guarded call to stop with CheckBreak.
(GetNthParamChoice): New procedure function.
(GetNthParamOrdered): Ditto.
(GetNthParamAnyClosest): Ditto.
(GetOuterModuleScope): Ditto.
gcc/testsuite/ChangeLog:
PR modula2/119914
* gm2/pim/fail/constintarraybyte.mod: New test.
Jan Hubicka [Wed, 23 Apr 2025 16:39:14 +0000 (18:39 +0200)]
Enable ip-cp cloning over non-hot edges
Currently enabling profile feedback regresses x264 and exchange. In both cases the root of the
issue is that ipa-cp cost model thinks cloning is not relevant when feedback is available
while it clones without feedback.
Consider:
__attribute__ ((used))
int a[1000];
__attribute__ ((noinline))
void
test2(int sz)
{
for (int i = 0; i < sz; i++)
a[i]++;
asm volatile (""::"m"(a));
}
__attribute__ ((noinline))
void
test1 (int sz)
{
for (int i = 0; i < 1000; i++)
test2(sz);
}
int main()
{
test1(1000);
return 0;
}
Here we want to clone call both test1 and test2 and specialize for 1000, but
ipa-cp will not do that, since it will skip call main->test1 as not hot since
it is called just once both with or without profile feedback.
In this simple testcase even without profile feedback we will track that main
is called once.
I think the testcase shows that hotness of call is not that relevant when
deciding whether we want to propagate constants across it. ipa-cp with IPA
profile can compute overall estimate of time saved (which is existing time
benefit computing time saved per invociation of the function multiplied by
number of executions) and see if result is big enough. An easy check is to
simply call maybe_hot_p on the resulting count.
So this patch makes ipa-cp to consider all calls sites except those known to be
unlikely executed (i.e. run 0 times in train run or known to lead to someting
bad) as interesting, which makes ipa-cp to propagate across them, find cloning
candidates and feed them into good_clonning_oppurtunity.
For this I added cs_interesting_for_ipcp_p which also attempts to do right
thing with partial training.
Now good_clonning_oppurtunity will currently return false, since it will figure
out that the call edge is not very frequent.
It already kind of knows that frequency of call instruction istself is not too
important, but instead of computing overall time saved, it tries to compare it
with param_ipa_cp_profile_count_base percentage of counts of call edges. I
think this is not very relevant since estimated time saved per call can be
large. So I dropped this logic and replaced it with simple use of overall
saved time.
Since ipa-cp is not dealing well with the cases where it hits the allowed unit
growth limit, we probably want to be more careful, so I keep existing metric
with this change.
So now we get:
Evaluating opportunities for test1/3.
- considering value 1000 for param #0 sz (caller_count: 1)
good_cloning_opportunity_p (time: 1, size: 8, count_sum: 1 (precise), overall time saved: 1 (adjusted)) -> evaluation: 0.12, threshold: 500
not cloning: time saved is not hot
good_cloning_opportunity_p (time: 129001, size: 20, count_sum: 1 (precise), overall time saved: 129001 (adjusted)) -> evaluation: 6450.05, threshold: 500
First call to good_cloning_oppurtunity considers the case where only test1 is
clonned. In this case time saved is 1 (for passing the value around) and since
it is called just once (count_sum) overall time saved is 1 which is not
considered hot and we also get very low evaulation score.
In the second call we consider cloning chain test1->test2. In this case time
saved is large (12901) since test2 is invoked many times and it is used to
controll the loop. We still know that the count is 1 but overall time is
129001 which is already considered relevant and we clone.
I also try to do something sensible in case we have calls both with
and without IPA profile (which can happen for comdats where profile got missing
or with LTO if some units were not trained).
Instead of checking whether sum of calls with known profile is nonzero, I keep
track if there are other calls and if so, also try the local heuristics that
is used without profile feedback.
The patch improves SPECint with -Ofast -fprofile-use by approx 1% by speeding
up x264 from 99.3s to 91.3s (9%) and exchange from 99.7s to 95.5s (3.3%).
We still get better x264 runtime without profile (86.4s for x264 and 93.8 for exchange).
The main problem I see is that ipa-cp has the global limit for growth of 10%
but does not consider the oppurtunities in priority order. Consequently if the
limit is hit, randomly some clone oppurtunities are dropped in favour of
others.
I dumped unit size changes with -flto -Ofast build of SPEC2017. Without patch I get:
Small units can grow up to 16000 instructions and other units are
large. So there is only one 156% growth hititng limits which is exchange
that has recursive clonning that goes specially.
With profile feedback ipacp basically shuts itself off:
So here we get 114% and 127 growth in x264 (two differen tbinaries)
56% growht in Deepsjeng, 61% growth in Exchange which all are above
10% cutoff.
Bootstrapped/regtested x86_64-linux.
gcc/ChangeLog:
* ipa-cp.cc (base_count): Remove.
(struct caller_statistics): Rename n_hot_calls to n_interesting_calls;
add called_without_ipa_profile.
(init_caller_stats): Update.
(cs_interesting_for_ipcp_p): New function.
(gather_caller_stats): collect n_interesting_calls and
called_without_profile.
(ipcp_cloning_candidate_p): Use n_interesting-calls rather then hot.
(good_cloning_opportunity_p): Rewrite heuristics when IPA profile is
present
(estimate_local_effects): Update.
(value_topo_info::propagate_effects): Update.
(compare_edge_profile_counts): Remove.
(ipcp_propagate_stage): Do not collect base_count.
(get_info_about_necessary_edges): Record whether function is called
without profile.
(decide_about_value): Update.
(ipa_cp_cc_finalize): Do not initialie base_count.
* profile-count.cc (profile_count::operator*): New.
(profile_count::operator*=): New.
* profile-count.h (profile_count::operator*): Declare
(profile_count::operator*=): Declare.
* params.opt: Remove ipa-cp-profile-count-base.
* doc/invoke.texi: Likewise.
Jan Hubicka [Wed, 23 Apr 2025 15:04:32 +0000 (17:04 +0200)]
Cost truth_value exprs in i386 vectorizer costs.
this patch implements costing of truth_value exprs. I.e.
a = b < c;
Those seems to be now the most common operations that goes to the addss path
except for in->fp and fp->int conversions.
For integer we use setcc, for FP there is CMccSS and variants which sets the
destination register a s a mast (i.e. -1 on true and 0 on false). Technically
these needs res&1 to get into 1 on true, 0 on false, but looking on examples
where this is used, it is common that the resulting code is optimized avoiding
need for this (except for cases wehre result is directly saved to memory).
For this reason I am accounting only one sse_op (CMccSS) itself.
Christophe Lyon [Fri, 14 Mar 2025 15:04:29 +0000 (15:04 +0000)]
testsuite: aarch64: arm: Enable vld1x?.c and vst1x?.c on arm [PR71233]
r14-7202-gc8ec3e1327cb1e added vld1xN and vst1xN intrinsics and some
tests on arm, but didn't enable some existing tests.
Since these tests are shared with aarch64, this patch removes the
'dg-skip-if "unimplemented" { arm*-*-* }' directives and relies on the
advsimd-intrinsics.exp driver to define the appropriate flags and
dg-do-what action. (A previous patch removed 'dg-do run', and this
patch removes 'dg-options "-O3"' which would override the options
computed by the test driver)
float16 intrinsics require the neon-fp16 FPU, which is possibly
enabled by advsimd-intrinsics.exp, so we include them unconditionally
on aarch64 or if fp16 is enabled on arm.
poly64 intrinsics would require crypto-neon-fp-armv8: the patch
enables the corresponding tests on aarch64 only, since for arm they
are already covered by other tests in gcc.target/arm/simd/. For some
reason, poly64 tests where missing from x2 and x3 tests, so the patch
adds them as needed.
Tested on aarch64-linux-gnu (no change), arm-linux-gnueabihf (the
additional tests are executed) and various flavors of arm-none-eabi
(the additional tests are compiled-only on M-profile, executed on
A-profile).
libstdc++: fix possible undefined atomic lock-free type aliases in module std
When building for 'i386-*' targets, all basic types are 'sometimes lock-free'
and thus std::atomic_signed_lock_free and std::atomic_unsigned_lock_free are
not declared. In the header <atomic>, they are placed in preprocessor
condition __cpp_lib_atomic_lock_free_type_aliases. In module std, they should
be the same.
libstdc++-v3/ChangeLog:
* src/c++23/std.cc.in (atomic_signed_lock_free): Guard with
preprocessor check for __cpp_lib_atomic_lock_free_type_aliases.
(atomic_unsigned_lock_free): Likewise.
liuhongt [Mon, 31 Mar 2025 03:15:41 +0000 (20:15 -0700)]
Accept allones or 0 operand for vcond_mask op1.
Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
or vpandn.
gcc/ChangeLog:
* config/i386/predicates.md (vector_or_0_or_1s_operand): New predicate.
(nonimm_or_0_or_1s_operand): Ditto.
* config/i386/sse.md (vcond_mask_<mode><sseintvecmodelower>):
Extend the predicate of operands1 to accept 0 or allones
operands.
(vcond_mask_<mode><sseintvecmodelower>): Ditto.
(vcond_mask_v1tiv1ti): Ditto.
(vcond_mask_<mode><sseintvecmodelower>): Ditto.
* config/i386/i386.md (mov<mode>cc): Ditto for operands[2] and
operands[3].
* config/i386/i386-expand.cc (ix86_expand_sse_fp_minmax):
Force immediate_operand to register.
gcc/testsuite/ChangeLog:
* gcc.target/i386/blendv-to-maxmin.c: New test.
* gcc.target/i386/blendv-to-pand.c: New test.
Jan Hubicka [Tue, 22 Apr 2025 21:47:14 +0000 (23:47 +0200)]
Fix vectorizer costs of COND_EXPR, MIN_EXPR, MAX_EXPR, ABS_EXPR, ABSU_EXPR
this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR,
MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and ABSU_EXPR
but it was only correct for FP variant (wehre it corresponds to andss clearing
sign bit). Integer abs/absu is open coded as conditinal move for SSE2 and
SSE3 instroduced an instruction.
MIN_EXPR/MAX_EXPR compiles to minss/maxss for FP and accroding to Agner Fog
tables they costs same as sse_op on all targets. Integer translated to single
instruction since SSE3.
COND_EXPR translated to open-coded conditional move for SSE2, SSE4.1 simplified
the sequence and AVX512 introduced masked registers.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Add special cases
for COND_EXPR; make MIN_EXPR, MAX_EXPR, ABS_EXPR and ABSU_EXPR more realistic.
Jakub Jelinek [Tue, 22 Apr 2025 19:27:28 +0000 (21:27 +0200)]
rs6000: Ignore OPTION_MASK_SAVE_TOC_INDIRECT differences in inlining decisions [PR119327]
The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
&& flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun))
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.
2025-04-22 Jakub Jelinek <jakub@redhat.com>
PR target/119327
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
OPTION_MASK_SAVE_TOC_INDIRECT differences.
This non-standard optimization breaks real-world code that expects the
result of std::projected to always (be a class type and) have a value_type
member, which isn't true for e.g. I=int*, so revert it for now.
Spencer Abson [Thu, 20 Mar 2025 12:18:57 +0000 (12:18 +0000)]
Induction vectorizer: prevent ICE for scalable types
We currently check that the target suppports PLUS_EXPR and MINUS_EXPR
with step_vectype (a fix for pr103523). However, vectorizable_induction
can emit a vectorized MULT_EXPR when calculating the step of each IV for
SLP, and both MULT_EXPR/FLOAT_EXPR when calculating VEC_INIT for float
inductions.
gcc/ChangeLog:
* tree-vect-loop.cc (vectorizable_induction): Add target support
checks for vectorized MULT_EXPR and FLOAT_EXPR where necessary for
scalable types.
Prefer target_supports_op_p over directly_supports_p for these tree
codes.
(vectorizable_nonlinear_induction): Fix a doc comment while I'm
here.
Spencer Abson [Thu, 23 Jan 2025 19:48:49 +0000 (19:48 +0000)]
AArch64: Define the spaceship optab [PR117013]
This expansion ensures that exactly one comparison is emitted for
spacesip-like sequences on floating-point operands, including when
the result of such sequences are compared against members of
std::<some_ordering>::<some_value>.
For both integer and floating-point types, we optimize for the case
in which the result of a spaceship-like operation is written to a GPR.
The PR highlights this issue for floating-point operands, but we also
make an improvement for integers, preferring:
cmp w0, w1
cset w1, gt
csinv w0, w1, wzr, ge
over:
cmp w0, w1
mov w0, 1
csinv w0, w0, wzr, ge
csel w0, w0, wzr, ne
to compute:
auto test(int a, int b) { return a <=> b;}
gcc/ChangeLog:
PR target/117013
* config/aarch64/aarch64-protos.h (aarch64_expand_fp_spaceship):
Declare optab expander function for floating-point types.
* config/aarch64/aarch64.cc (aarch64_expand_fp_spaceship):
Define optab expansion for floating-point types (new function).
* config/aarch64/aarch64.md (spaceship<mode>4):
Add define_expands for spaceship<mode>4 on integer and
floating-point types.
gcc/testsuite/ChangeLog:
PR target/117013
* g++.target/aarch64/spaceship_1.C: New test.
* g++.target/aarch64/spaceship_2.C: New test.
* g++.target/aarch64/spaceship_3.C: New test.
aarch64: Update FP8 dependencies for -mcpu=olympus
We had not noticed that after g:299a8e2dc667e795991bc439d2cad5ea5bd379e2 the
FP8FMA and FP8DOT4 features aren't implied by FP8FMA. The intent is for
-mcpu=olympus to support all of them.
Fix the definition to include the relevant sub-features explicitly.
cobol: Restrict COBOL to supported Linux arches [PR119217]
The COBOL frontend is currently built on all x86_64 and aarch64 hosts
although the code contains some Linux/glibc specifics that break the build
e.g. on Solaris/amd64.
* match.cc (match_exit_cycle): Allow to exit team block.
(gfc_match_end_team): Create end_team node also without
parameter list.
* trans-intrinsic.cc (conv_stat_and_team): Team and team_number
only need to be a single pointer.
* trans-stmt.cc (trans_associate_var): Create a mapping coarray
token for coarray associations or it is not addressed correctly.
* trans.h (enum gfc_coarray_regtype): Add mapping mode to
coarray register.
libgfortran/ChangeLog:
* caf/libcaf.h: Add mapping mode to coarray's register.
* caf/single.c (_gfortran_caf_register): Create a token sharing
another token's memory.
(check_team): Check team parameters to coindexed expressions are
valid.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/coindexed_3.f08: Add minimal test for
get_team().
* gfortran.dg/team_change_2.f90: Add test for change team with
label and exiting out of it.
* gfortran.dg/team_end_2.f90: Check parsing to labeled team
blocks is correct now.
* gfortran.dg/team_end_3.f90: Check that end_team call is
generated for labeled end_teams, too.
* gfortran.dg/coarray/coindexed_5.f90: New test.
Fortran: Add teams support in image_index and num_images for F2018
This more or less completes the set of functions that are affected by
teams.
gcc/fortran/ChangeLog:
* check.cc (gfc_check_image_index): Check for team or
team_number correctnes.
(gfc_check_num_images): Same.
* gfortran.texi: Update documentation on num_images' API
function.
* intrinsic.cc (add_functions): Update signature of image_index
and num_images. Both can take either a team handle or number.
* intrinsic.h (gfc_check_num_images): Update signature to take
either team or team_number.
(gfc_check_image_index): Can take coarray, subscripts and team
or team number now.
(gfc_simplify_image_index): Same.
(gfc_simplify_num_images): Same.
(gfc_resolve_image_index): Same.
* intrinsic.texi: Update documentation of num_images() Fortran
function.
* iresolve.cc (gfc_resolve_image_index): Update signature.
* simplify.cc (gfc_simplify_num_images): Update signature and
remove undocumented failed argument.
(gfc_simplify_image_index): Add team or team number argument.
* trans-intrinsic.cc (conv_stat_and_team): Because being
optional teams need to be a pointer to the opaque pointer.
(conv_caf_sendget): Correct call; was two arguments short.
(trans_image_index): Support team or team_number.
(trans_num_images): Same.
(conv_intrinsic_cobound): Adapt to changed signature of
num_images in call.
* trans-stmt.cc (gfc_trans_sync): Same.
This_image() no longer has a distance formal argument, but a team one.
The source of the distance argument could not be identified, i.e.
whether it came from a TS or standard draft. To implement only the
standard it is removed. Besides being defined, it was not used anyway.
PR fortran/87326
gcc/fortran/ChangeLog:
* check.cc (gfc_check_this_image): Check the three different
parameter lists possible for this_image and sort them correctly.
* gfortran.texi: Update documentation on this_image's API.
* intrinsic.cc (add_functions): Update this_image's signature.
(check_specific): Add specific check for this_image.
* intrinsic.h (gfc_check_this_image): Change to flexible
argument list.
* intrinsic.texi: Update documentation on this_image().
* iresolve.cc (gfc_resolve_this_image): Resolve the different
arguments.
* simplify.cc (gfc_simplify_this_image): Simplify the simplify
routine.
* trans-decl.cc (gfc_build_builtin_function_decls): Update
signature of this_image.
* trans-expr.cc (gfc_caf_get_image_index): Use correct signature
of this_image.
* trans-intrinsic.cc (trans_this_image): Adapt to correct
signature.
libgfortran/ChangeLog:
* caf/libcaf.h (_gfortran_caf_this_image): Correct prototype.
* caf/single.c (struct caf_single_team): Add new_index of image.
(_gfortran_caf_this_image): Return the image index in the given team.
(_gfortran_caf_form_team): Set new_index in team structure.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray_10.f90: Update error messages.
* gfortran.dg/coarray_lib_this_image_1.f90: Same.
* gfortran.dg/coarray_lib_this_image_2.f90: Same.
* gfortran.dg/coarray_this_image_1.f90: Add more tests and
remove incorrect ones.
* gfortran.dg/coarray_this_image_2.f90: Test more features.
* gfortran.dg/coarray_this_image_3.f90: New test.
* check.cc (team_type_check): Check a type for being team_type
from the iso_fortran_env module.
(gfc_check_image_status): Use team_type check.
(gfc_check_get_team): Check for level argument.
(gfc_check_team_number): Use team_type check.
* expr.cc (gfc_check_assign): Add treatment for returning
team_type in caf-single mode.
* gfortran.texi: Add/Update documentation for get_team and
team_number API functions.
* intrinsic.cc (add_functions): Update get_team signature.
* intrinsic.h (gfc_resolve_get_team): Add prototype.
* intrinsic.texi: Add/Update documentation for get_team and
team_number Fortran functions.
* iresolve.cc (gfc_resolve_get_team): Resolve return type to be
of type team_type.
* iso-fortran-env.def: Update STAT_LOCK constants. They have
nothing to do with files. Add level constants for get_team.
* libgfortran.h: Add level and unlock_stat constants.
* simplify.cc (gfc_simplify_get_team): Simply to correct return
type team_type.
* trans-decl.cc (gfc_build_builtin_function_decls): Update
get_team and image_status API prototypes to correct signatures.
* trans-intrinsic.cc (conv_intrinsic_image_status): Translate
second parameter correctly.
(conv_intrinsic_team_number): Translate optional single team
argument correctly.
(gfc_conv_intrinsic_function): Add translation of get_team.
libgfortran/ChangeLog:
* caf/libcaf.h: Add constants for get_team's level argument and
update stat values for failed images.
(_gfortran_caf_team_number): Add prototype.
(_gfortran_caf_get_team): Same.
* caf/single.c (_gfortran_caf_team_number): Get the given team's
team number.
(_gfortran_caf_get_team): Get the current team or the team given
by level when the argument is present.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/image_status_1.f08: Correct check for
team_type.
* gfortran.dg/pr102458.f90: Adapt to multiple errors.
* gfortran.dg/coarray/get_team_1.f90: New test.
* gfortran.dg/team_get_1.f90: New test.
* gfortran.dg/team_number_1.f90: Correct Fortran syntax.
* coarray.cc (split_expr_at_caf_ref): Treat polymorphic types
correctly. Ensure resolve of expression after coindex.
(create_allocated_callback): Fix parameter of allocated function
for coarrays.
(coindexed_expr_callback): Improve detection of coarrays in
allocated function.
* decl.cc (gfc_match_end): Add team block matching.
* dump-parse-tree.cc (show_code_node): Dump change team block as
such.
* frontend-passes.cc (gfc_code_walker): Recognice team block.
* gfortran.texi: Add documentation for team api functions.
* intrinsic.texi: Add documentation about team_type in
iso_fortran_env module.
* iso-fortran-env.def (team_type): Use helper to get pointer
kind.
* match.cc (gfc_match_associate): Factor out matching of
association list, because it is used in change team as well.
(check_coarray_assoc): Ensure, that the association is to a
coarray.
(match_association_list): Match a list of association either in
associate or in change team.
(gfc_match_form_team): Match form team correctly include
new_index.
(gfc_match_change_team): Match change team with association
list.
(gfc_match_end_team): Match end team including stat and errmsg.
(gfc_match_return): Prevent return from team block.
* parse.cc (decode_statement): Sort team block.
(next_statement): Same.
(check_statement_label): Same.
(accept_statement): Same.
(verify_st_order): Same.
(parse_associate): Renamed to move_associates_to_block...
(move_associates_to_block): ... to enable reuse for change team.
(parse_change_team): Parse it as block.
(parse_executable): Same.
* parse.h (enum gfc_compile_state): Add team block as compiler
state.
* resolve.cc (resolve_scalar_argument): New function to resolve
an argument to a statement as a scalar.
(resolve_form_team): Resolve its members.
(resolve_change_team): Same.
(resolve_branch): Prevent branch from jumping out of team block.
(check_team): Removed.
* trans-decl.cc (gfc_build_builtin_function_decls): Add stat and
errmsg to team API functions and update their arguments.
* trans-expr.cc (gfc_trans_subcomponent_assign): Also null the
token when moving memory or an allocated() will not detect a
free.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_is_present_remote):
Adapt to signature change no longer a pointer-pointer.
* trans-stmt.cc (gfc_trans_form_team): Translate a form team
including new_index.
(gfc_trans_change_team): Translate a change team as a block.
libgfortran/ChangeLog:
* caf/libcaf.h: Remove commented block.
(_gfortran_caf_form_team): Allow for all relevant arguments.
(_gfortran_caf_change_team): Same.
(_gfortran_caf_end_team): Same.
(_gfortran_caf_sync_team): Same.
* caf/single.c (struct caf_single_team): Team handling
structures.
(_gfortran_caf_init): Initialize initial team.
(free_team_list): Free all teams and the memory they hold.
(_gfortran_caf_finalize): Free initial and sibling teams.
(_gfortran_caf_register): Add memory registered to current team.
(_gfortran_caf_deregister): Unregister memory from current team.
(_gfortran_caf_is_present_on_remote): Check token's memptr for
llocation. May have been deallocated by an end team.
(_gfortran_caf_form_team): Push a new team stub to the list.
(_gfortran_caf_change_team): Push a formed team on top of the
ctive teams stack.
(_gfortran_caf_end_team): End the active team, free all memory
allocated during its livespan.
(_gfortran_caf_sync_team): Take stat and errmsg into account.
gcc/testsuite/ChangeLog:
* gfortran.dg/team_change_2.f90: New test.
* gfortran.dg/team_change_3.f90: New test.
* gfortran.dg/team_end_2.f90: New test.
* gfortran.dg/team_end_3.f90: New test.
* gfortran.dg/team_form_2.f90: New test.
* gfortran.dg/team_form_3.f90: New test.
* gfortran.dg/team_sync_2.f90: New test.
Fortran: Unify handling of STAT= and ERRMSG= optional arguments [PR87939]
In preparing F2018 Teams handling improvements, unify handling of STAT=
and ERRMSG= optional arguments. Handling of stat and errmsg in most
teams statements is corrected in the next patch.
Implement stat and errmsg for move_alloc () to comply with F2018.
PR fortran/87939
gcc/fortran/ChangeLog:
* check.cc (gfc_check_move_alloc): Add stat and errmsg to
move_alloc.
* dump-parse-tree.cc (show_sync_stat): New helper function.
(show_code_node): Use show_sync_stat to print stat and errmsg.
* gfortran.h (struct sync_stat): New struct to unify stat and
errmsg handling.
* intrinsic.cc (add_subroutines): Correct signature of
move_alloc.
* intrinsic.h (gfc_check_move_alloc): Correct signature of
check_move_alloc.
* match.cc (match_named_arg): Match an optional argument to a
statement.
(match_stat_errmsg): Match a stat= or errmsg= named argument.
(gfc_match_critical): Use match_stat_errmsg to match the named
arguments.
(gfc_match_sync_team): Same.
* resolve.cc (resolve_team_argument): Resolve an expr to have
type TEAM_TYPE from iso_fortran_env.
(resolve_scalar_variable_as_arg): Resolve an argument as a
scalar type.
(resolve_sync_stat): Resolve stat and errmsg expressions.
(resolve_sync_team): Resolve a sync team statement using
sync_stat helper.
(resolve_end_team): Same.
(resolve_critical): Same.
* trans-decl.cc (gfc_build_builtin_function_decls): Correct
sync_team signature.
* trans-intrinsic.cc (conv_intrinsic_move_alloc): Store stat
an errmsg optional arguments in helper struct and use helper
to translate.
* trans-stmt.cc (trans_exit): Implement DRY pattern for
generating an _exit().
(gfc_trans_sync_stat): Translate stat and errmsg contents.
(gfc_trans_end_team): Use helper to translate stat and errmsg.
(gfc_trans_sync_team): Same.
(gfc_trans_critical): Same.
* trans-stmt.h (gfc_trans_sync_stat): New function.
* trans.cc (gfc_deallocate_with_status): Parameterize check at
runtime to allow unallocated (co-)array when freeing a
structure.
(gfc_deallocate_scalar_with_status): Same and also add errmsg.
* trans.h (gfc_deallocate_with_status): Signature changes.
(gfc_deallocate_scalar_with_status): Same.
libgfortran/ChangeLog:
* caf/single.c (_gfortran_caf_lock): Correct stat value, if
lock is already locked by current image.
(_gfortran_caf_unlock): Correct stat value, if lock is not
locked.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray_critical_2.f90: New test.
* gfortran.dg/coarray_critical_3.f90: New test.
* gfortran.dg/team_sync_1.f90: New test.
* gfortran.dg/move_alloc_11.f90: New test.
Support -mcpu=xt-c908, xt-c908v, xt-c910, xt-c910v2, xt-c920, xt-c920v2
for Xuantie series cpu.
ref:https://www.xrvm.cn/community/download?id=4224248662731067392
without fmv_cost, vector_unaligned_access, use_divmod_expansion, overlap_op_by_pieces, fill the tune info with generic ooo for further modification.
* gcc.target/riscv/mcpu-xt-c908.c: test -mcpu=xt-c908.
* gcc.target/riscv/mcpu-xt-c910.c: test -mcpu=xt-c910.
* gcc.target/riscv/mcpu-xt-c920v2.c: test -mcpu=xt-c920v2.
* gcc.target/riscv/mcpu-xt-c908v.c: test -mcpu=xt-c908v.
* gcc.target/riscv/mcpu-xt-c910v2.c: test -mcpu=xt-c910v2.
* gcc.target/riscv/mcpu-xt-c920.c: test -mcpu=xt-c920.
After commit r15-8947-g8ed2d5d219e999, which added new tests using
gcov, the CI noticed failures because it was calling 'gcov' instead of
$target-gcov.
This is because the CI scripts override GXX_UNDER_TEST, but still run
the testsuite in-tree, and gcc-transform-out-of-tree only depends on
TESTING_IN_BUILD_TREE but the definition of GCOV uses GCC_UNDER_TEST,
GXX_UNDER_TEST or GDC_UNDER_TEST (assuming their default values when
TESTING_IN_BUILD_TREE).
To handle such a case, this patch adds support for a new variable,
GCOV_UNDER_TEST, which overrides the current behavior if defined, and
has an effect similar to overriding GCC_UNDER_TEST etc...
Unfortunately, the change needs to be duplicated in several places,
which use either GCC_UNDER_TEST, GXX_UNDER_TEST or GDC_UNDER_TEST.
Tested g++.dg/gcov/gcov.exp and now g++.dg/gcov/gcov-22.C passes
(calling <installdir>/bin/$target-gcov as instructed by the CI
scripts). No change observed on gcc.misc-tests/gcov.exp, and I could
not test gdc.dg/gcov.exp and gnat.dg/gcov/gcov.exp.
Document locality partitioning params in invoke.texi
Filip Kastl pointed out that contrib/check-params-in-docs.py complains
about params not documented in invoke.texi, so this patch adds the short
explanation from params.opt for these to the invoke.texi section.
Thanks for the reminder.
testsuite: Use sigsetjmp in gcc.misc-tests/gcov-31.c
The gcc.misc-tests/gcov-31.c test FAILs on Solaris and Darwin:
FAIL: gcc.misc-tests/gcov-31.c (test for excess errors)
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.misc-tests/gcov-31.c:23:5:
error: implicit declaration of function '__sigsetjmp'; did you mean
'sigsetjmp'? [-Wimplicit-function-declaration]
__sigsetjmp is a Linux/glibc implementation detail. Other tests just
use sigsetjmp directly, so this patch follows suit.
Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11,
x86_64-pc-linux-gnu, and x86_64-apple-darwin24.4.0.
c++/modules: Find non-exported reachable decls when instantiating friend classes [PR119863]
In r15-9029-geb26b667518c95, we started checking for conflicting
declarations with any reachable decl attached to the same originating
module. This exposed the issue in the PR, where we would always create
a new type even if a matching type existed in the original module.
This patch reworks lookup_imported_hidden_friend to handle this case
better, by first checking for any reachable decl in the attached module
before looking in the mergeable decl slots.
PR c++/119863
gcc/cp/ChangeLog:
* name-lookup.cc (get_mergeable_namespace_binding): Remove
no-longer-used function.
(lookup_imported_hidden_friend): Also look for hidden imported
decls in an attached decl's module.
gcc/testsuite/ChangeLog:
* g++.dg/modules/tpl-friend-18_a.C: New test.
* g++.dg/modules/tpl-friend-18_b.C: New test.
* g++.dg/modules/tpl-friend-18_c.C: New test.
Andrew Pinski [Tue, 22 Apr 2025 03:52:38 +0000 (20:52 -0700)]
Skip g++.dg/eh/pr119507.C on arm eabi
arm eabi emits the exception table using the handlerdata directive
and does not use a comdat section for comdat functions. So this
testcase should be skipped for arm eabi.
Pushed as obvious after a quick test.
gcc/testsuite/ChangeLog:
* g++.dg/eh/pr119507.C: Skip for arm eabi.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
[riscv] vec_dup immediate constants in pred_broadcast expand [PR118182]
pr118182-2.c fails on gcc-14 because it lacks the late_combine passes,
particularly the one that runs after register allocation.
Even in the trunk, the predicate broadcast for the add reduction is
expanded and register-allocated as _zvfh, taking up an unneeded scalar
register to hold the constant to be vec_duplicated.
It is the late combine pass after register allocation that substitutes
this unneeded scalar register into the vec_duplicate, resolving to the
_zero or _imm insns.
It's easy enough and more efficient to expand pred_broadcast to the
insns that take the already-duplicated vector constant, when the
operands satisfy the predicates of the _zero or _imm insns.
for gcc/ChangeLog
PR target/118182
* config/riscv/vector.md (@pred_broadcast<mode>): Expand to
_zero and _imm variants without vec_duplicate.
Jason Merrill [Tue, 11 Mar 2025 15:16:26 +0000 (11:16 -0400)]
c++: reorder constexpr checks
My proposed change to stop setting TREE_STATIC on constexpr heap
pseudo-variables led to a diagnostic regression because we would get the
generic "not constant" diagnostic before the "allocated storage" diagnostic.
So let's move the generic verify_constant down a bit.
Jason Merrill [Sun, 20 Apr 2025 16:31:35 +0000 (12:31 -0400)]
c++: new size folding [PR118775]
r15-7893 added a workaround for a case where we weren't registering (long)&a
as invalid in a constant-expression, because build_new_1 had folded away the
CONVERT_EXPR that we rely on to diagnose that problem. In general we want
to defer most folding until cp_fold_function, so let's fold less here. We
mainly want to expose constant size so we can treat it differently, and we
already did any constexpr evaluation when initializing cst_outer_nelts, so
fold_to_constant seems like the right choice.
Jason Merrill [Wed, 5 Mar 2025 21:42:32 +0000 (16:42 -0500)]
c++: static constexpr strictness [PR99456]
r11-7740 limited constexpr rejection of conversion from pointer to integer
to manifestly constant-evaluated contexts; it should instead check whether
we're in strict mode.
The comment for that commit noted that making this change regressed other
tests, which turned out to be because maybe_constant_init_1 was not being
properly strict for variables declared constexpr/constinit.
PR c++/99456
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Check strict
instead of manifestly_const_eval.
(maybe_constant_init_1): Be strict for static constexpr vars.
Jan Hubicka [Mon, 21 Apr 2025 18:16:50 +0000 (20:16 +0200)]
Fix cost of vectorized double->float conversion
In previous patch I miscomputed costs of cvtpd2pf instruction
which mistakely gets accounted as 2 (VEC_PACK_TRUNC_EXPR).
Vectorizer can produce both, but when producing VEC_PACK_TRUNC_EXPR
it use promote_demote patch. This patch thus simplifies
handling of NOP_EXPR since in that case we should always be producing
only one instruction.
PR target/119879
* config/i386/i386.cc (fp_conversion_stmt_cost): Inline to ...
(ix86_vector_costs::add_stmt_cost): ... here; fix handling of NOP_EXPR.
Andrew Bennett [Mon, 21 Apr 2025 15:56:47 +0000 (09:56 -0600)]
[PATCH 21/61] Testsuite: Modify the gcc.dg/memcpy-4.c test
From: Andrew Bennett <andrew.bennett@imgtec.com>
Firstly, remove the MIPS specific bit of the test.
Secondly, create a MIPS specific version in the gcc.target/mips.
This will only execute for a MIPS ISA less than R6.
Andrew Pinski [Sat, 29 Mar 2025 00:25:56 +0000 (17:25 -0700)]
except: Don't use the cached value of the gcc_except_table section for comdat functions [PR119507]
This has been broken since GCC started to put the comdat functions' gcc_except_table into their
own section; r0-118218-g3e6011cfebedfb. What would happen is after a non-comdat function is processed,
the cached value would always be used even for comdat function. Instead we should create a new
section for comdat functions.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/119507
gcc/ChangeLog:
* except.cc (switch_to_exception_section): Don't use the cached section if
the current function is in comdat.
gcc/testsuite/ChangeLog:
* g++.dg/eh/pr119507.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Thu, 9 Jan 2025 20:53:27 +0000 (12:53 -0800)]
Add assert to array_slice::begin/end
So while debugging PR 118320, I found it was useful to have
an assert inside array_slice::begin/end that the array slice isvalid
rather than getting an segfault. This adds an assert that is only
enabled for checking.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* vec.h (array_slice::begin): Assert that the
slice is valid.
(array_slice::end): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
/vol/gcc/src/hg/master/cobol/libgcobol/intrinsic.cc: In function ‘void
__gg__formatted_current_date(cblc_field_t*, cblc_field_t*, std::size_t,
std::size_t)’:
/vol/gcc/src/hg/master/cobol/libgcobol/intrinsic.cc:1480:6: error: ‘struct
std::tm’ has no member named ‘tm_zone’; did you mean ‘tm_mon’?
1480 | tm.tm_zone = "GMT";
| ^~~~~~~
| tm_mon
struct tm.tm_zone is new in POSIX.1-2024, thus cannot be assumed to be
present universally.
This patch checks for its presence and guards the use accordingly.
Bootstrapped without regressions on amd64-pc-solaris2.11,
sparcv9-sun-solaris2.11, and x86_64-pc-solaris2.11.
Thomas Schwinge [Sat, 19 Apr 2025 13:49:34 +0000 (15:49 +0200)]
Disable parallel testing for 'rust/compile/nr2/compile.exp' [PR119508]
..., using the standard idiom. This '*.exp' file doesn't adhere to the
parallel testing protocol as defined in 'gcc/testsuite/lib/gcc-defs.exp'.
This also restores proper behavior for '*.exp' files executing after (!) this
one, which erroneously caused hundreds or even thousands of individual test
cases get duplicated vs. skipped, randomly, depending on the '-jN' level.
Kaiweng's patch to stop freeing riscv_arch_string was correct, but incomplete
as there's another path that was freeing that node, which is just plain wrong
for a node allocated by the GC system.
This patch removes that call to free() which fixes the test. I've spun it in
my tester and will obviously wait for the pre-commit system to render a verdict
before moving forward.
PR target/119865
gcc/
* config/riscv/riscv.cc (parse_features_for_version): Do not
explicitly free the architecture string.
Jeff Law [Sat, 19 Apr 2025 18:30:42 +0000 (12:30 -0600)]
[RISC-V][PR target/118410] Improve code generation for some logical ops
I'm posting this on behalf of Shreya Munnangi who is working as an intern with
me. I've got her digging into prerequisites for removing mvconst_internal and
would prefer she focus on that rather than our patch process at this time.
--
We can use the orn, xnor, andn instructions on RISC-V to improve the code
generated logical operations when one operand is a constant C where
synthesizing ~C is cheaper than synthesizing C.
This is going to be an N -> N - 1 splitter rather than a define_insn_and_split.
A define_insn_and_split can obviously work, but has multiple undesirable
effects in general.
As a result of implementing as a simple define_split we're not supporting AND
at this time. We need to clean up the mvconst_internal situation first after
which supporting AND is trivial.
This has been tested in Ventana's CI system as well as my tester. Obviously
we'll wait for the pre-commit tester to run before moving forward.
PR target/118410
gcc/
* config/riscv/bitmanip.md (logical with constant argument): New
splitter for cases where synthesizing ~C is cheaper than synthesizing
the original constant C.
gcc/testsuite/
* gcc.target/riscv/pr118410-1.c: New test.
* gcc.target/riscv/pr118410-2.c: Likewise.
Jan Hubicka [Sat, 19 Apr 2025 16:51:27 +0000 (18:51 +0200)]
Add tables for SSE fp conversion costs
as disucssed, I will proceed adding costs for common SSE operations which are
currently globbed into addss cost, so we do not need to set it incorrectly for
znver5. Looking through the stats, there are quite few missing cases, so I am
starting with those that I think are more common. I plan to do it in smaller
steps so individual changes gets benchmarked by LNT and also can be bisected
to.
This patch adds costs for various SSE and AVX FP->FP conversions (extensions and
truncations). Looking through Agner Fog's tables, these are bit assymetric so I
added cost for CVTSS2SD which is also used for CVTSD2SS, CVTPS2PD and CVTPD2PS,
cost for 256bit VCVTPS2PS (also used for oposite direction) and cost for 512bit
one.
I plan to add int->int conversions next and then int->fp & fp->int which are
more tricky since they may bundle inter-unit move.
I also noticed that size tables are wrong for all SSE instructions so I updated
them. With some love I think vectorization can work as size optimization, too,
but we need more work on that.
Those values I can find in Agner Fog tables are taken from there, other are guesses
(especially for yongfeng_cost and shijidadao_cost).
gcc/ChangeLog:
* config/i386/i386.cc (vec_fp_conversion_cost): New function.
(ix86_rtx_costs): Use it for SSE/AVX FP conversoins.
(ix86_builtin_vectorization_cost): Fix indentation;
and use vec_fp_conversion_cost in vec_promote_demote.
(fp_conversion_stmt_cost): New function.
(ix86_vector_costs::add_stmt_cost): Use it to cost NOP_EXPR
and vec_promote_demote.
* config/i386/i386.h (struct processor_costs):
* config/i386/x86-tune-costs.h (struct processor_costs):
Andrew Pinski [Sat, 19 Apr 2025 03:28:40 +0000 (20:28 -0700)]
Fix pr118947-1.c and pr78408-3.c on targets where 32 bytes memcpy uses a vector
The problem here is on targets where a 32byte memcpy will use an integral (vector) type
to do the copy and the code will be optimized a different way than expected. This changes
the testcase instead to use a size of 1025 to make sure there is no target that will use an
integral (vector) type for the memcpy and be optimized via the method that was just added.
Pushed as obvious after a test run.
gcc/testsuite/ChangeLog:
* gcc.dg/pr118947-1.c: Use 1025 as the size of the buf.
* gcc.dg/pr78408-3.c: Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Mon, 20 Jan 2025 23:24:39 +0000 (15:24 -0800)]
combine: Better split point for `(and (not X))` [PR111949]
In a similar way find_split_point handles `a+b*C`, this adds
the split point for `~a & b`. This allows for better instruction
selection when the target has this instruction (aarch64, arm and x86_64
are examples which have this).
Built and tested for aarch64-linux-gnu.
PR rtl-optimization/111949
gcc/ChangeLog:
* combine.cc (find_split_point): Add a split point
for `(and (not X) Y)` if not in the outer set already.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/bic-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jason Merrill [Mon, 3 Feb 2025 15:35:33 +0000 (10:35 -0500)]
c++: minor EXPR_STMT cleanup
I think it was around PR118574 that I noticed a few cases where we were
unnecessarily wrapping a statement tree in a further EXPR_STMT. Let's avoid
that and also use finish_expr_stmt in a few places in the coroutines code
that were building EXPR_STMT directly.
gcc/cp/ChangeLog:
* coroutines.cc (coro_build_expr_stmt)
(coro_build_cvt_void_expr_stmt): Remove.
(build_actor_fn): Use finish_expr_stmt.
* semantics.cc (finish_expr_stmt): Avoid wrapping statement in
EXPR_STMT.
(finish_stmt_expr_expr): Add comment.
Alpha: Fix base block alignment calculation regression
In determination of base block alignment we only examine a COMPONENT_REF
tree node at hand without ever checking if its ultimate alignment has
been reduced by the combined offset going back to the outermost object.
Consequently cases have been observed where quadword accesses have been
produced for a memory location referring a nested struct member only
aligned to the longword boundary, causing emulation to trigger.
Address this issue by recursing into COMPONENT_REF tree nodes until the
outermost one has been reached, which is supposed to be a MEM_REF one,
accumulating the offset as we go, fixing a commit e0dae4da4c45 ("Alpha:
Also use tree information to get base block alignment") regression.
Bail out and refrain from using tree information for alignment if we end
up at something different or we are unable to calculate the offset at
any point.
gcc/
* config/alpha/alpha.cc
(alpha_get_mem_rtx_alignment_and_offset): Recurse into
COMPONENT_REF nodes.
gcc/testsuite/
* gcc.target/alpha/memcpy-nested-offset-long.c: New file.
* gcc.target/alpha/memcpy-nested-offset-quad.c: New file.
François Dumont [Tue, 8 Apr 2025 17:35:28 +0000 (19:35 +0200)]
libstdc++: Add _GLIBCXX_DEBUG checks on unordered container local_iterator
Complete tests on _GLIBCXX_DEBUG checks in include/debug/safe_local_iterator.h.
Fix several tests not testing the container corresponding to their location in the
testsuite directory.
libstdc++-v3/ChangeLog:
* testsuite/util/debug/unordered_checks.h (fill_container): New helper method.
(use_erased_local_iterator, invalid_local_iterator_pre_increment)
(invalid_local_iterator_post_increment, invalid_local_iterator_compare)
(invalid_local_iterator_range): Use latter.
(fill_and_get_local_iterator): New, use fill_container.
(use_invalid_local_iterator): Use latter.
(invalid_local_iterator_arrow_operator): New test function.
(invalid_local_iterator_copy_instantiation): New test function.
(invalid_local_iterator_move_instantiation): New test function.
(invalid_local_iterator_copy_assignment): New test function.
(invalid_local_iterator_move_assignment): New test function.
(invalid_local_iterator_const_conversion): New test function.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_map/debug/max_load_factor_neg.cc: Test unordered_map.
* testsuite/23_containers/unordered_multimap/debug/begin2_neg.cc: Test unordered_multimap.
* testsuite/23_containers/unordered_multimap/debug/bucket_size_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/cbegin_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/cend_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/end1_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/end2_neg.cc: Likewise.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_multimap/debug/max_load_factor_neg.cc:
Test unordered_multimap.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_multiset/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_arrow_operator_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_const_conversion_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_copy_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_copy_construction_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_move_assignment_neg.cc:
New test case.
* testsuite/23_containers/unordered_set/debug/invalid_local_iterator_move_construction_neg.cc:
New test case.
Jeff Law [Fri, 18 Apr 2025 18:19:30 +0000 (12:19 -0600)]
[RISC-V] Fix missed bext discovery
RISC-V has the ability to extract a single bit out of a register from a fixed
or variable position.
While looking at 502.gcc a little while ago I realize that we failed to use
bext inside bitmap_bit_p for its return value.
The core "problem" is that the RISC-V does not define SHIFT_COUNT_TRUNCATED
(for good reasons). As a result the target is largely responsible for handling
elimination of shift count/bit position masking.
There's a follow-up patch I've been working on with an intern to improve
detection of bext in more cases. This one stands independently though and is
probably the most important of the missed cases.
Will push to the trunk assuming pre-commit testing is green. It's already been
through my tester as well as Ventana's internal testing.
gcc
* config/riscv/bitmanip.md (*bext<mode>_mask_pos): New pattern
for extracting a single bit at masked bit position.
Andrew Pinski [Tue, 8 Apr 2025 00:57:07 +0000 (17:57 -0700)]
DSE: Trim stores of 0 like triming stores of {} [PR87901]
This is the second part of the PR which comes from transformation
of memset into either stores of 0 (via an integral type) or stores
of {}. We already handle stores of `{}`, this just extends that to
handle of the constant 0 and treat it similarly.
PR tree-optimization/87901
gcc/ChangeLog:
* tree-ssa-dse.cc (maybe_trim_constructor_store): Add was_integer_cst argument.
Check for was_integer_cst instead of `{}` when was_integer_cst is true.
(maybe_trim_partially_dead_store): Handle INTEGER_CST stores of 0 as stores of `{}`.
Udpate call to maybe_trim_constructor_store for CONSTRUCTOR.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dse-53.c: New test.
* gcc.dg/tree-ssa/ssa-dse-54.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Tue, 8 Apr 2025 00:06:17 +0000 (17:06 -0700)]
DSE: Support triming of some more memset [PR87901]
DSE has support for trimming memset (and memset like) statements.
In this case we have `MEM <unsigned char[17]> [(char * {ref-all})&z] = {};` in
the IR and when we go to trim it, we call build_fold_addr_expr which leaves around
a cast from one pointer type to another. This is due to build_fold_addr_expr
being generic but in gimple you don't need these casts.
PR tree-optimization/87901
gcc/ChangeLog:
* tree-ssa-dse.cc (maybe_trim_constructor_store): Strip over useless type
conversions after taking the address of the MEM_REF.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dse-52.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Fri, 14 Feb 2025 04:23:48 +0000 (20:23 -0800)]
gimple: Canonical order for invariants [PR118902]
So unlike constants, address invariants are currently put first if
used with a SSA NAME.
It would be better if address invariants are consistent with constants
and this patch changes that.
gcc.dg/tree-ssa/pr118902-1.c is an example where this canonicalization
can help. In it if `p` variable was a global variable, FRE (VN) would have figured
it out that `a` could never be equal to `&p` inside the loop. But without the
canonicalization we end up with `&p == a.0_1` which VN does try to handle for conditional
VN.
Bootstrapped and tested on x86_64.
PR tree-optimization/118902
gcc/ChangeLog:
* fold-const.cc (tree_swap_operands_p): Place invariants in the first operand
if not used with constants.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr118902-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>