Haochen Jiang [Wed, 16 Oct 2024 07:40:12 +0000 (15:40 +0800)]
testsuite: Add -march=x86-64-v3 to AVX10 testcases to slience warning for GCC built with AVX512 arch
Currently, when build GCC with config --with-arch=native on AVX512
machines, if we run AVX10.2 testcases, we will get vector size warnings.
It is expected but annoying. Simply add -march=x86-64-v3 to override
--with-arch=native to slience all the warnings.
LRA substitute all scratches with new pseudos, so we have:
(insn 484 483 485 72 (parallel [
(set (reg/v:SI 143 [ __q1 ])
(plus:SI (reg/v:SI 143 [ __q1 ])
(const_int -2 [0xfffffffffffffffe])))
(clobber (reg:QI 619))
]) "/mnt/d/avr-lra/udivmoddi.c":163:405 discrim 5 186 {addsi3}
(expr_list:REG_UNUSED (reg:QI 619)
(nil)))
Pseudo 619 is a special scratch register generated by LRA which is marked in `scratch_bitmap' and can be tested by call `ira_former_scratch_p(regno)'.
In dump file (udivmoddi.c.317r.reload) we have:
Creating newreg=619
Removing SCRATCH to p619 in insn #484 (nop 3)
rescanning insn with uid = 484.
After that LRA tries to spill (reg:QI 619)
It's a bug because (reg:QI 619) is an output scratch register which is already something like spill register.
Fragment from udivmoddi.c.317r.reload:
Choosing alt 2 in insn 484: (0) r (1) 0 (2) nYnn (3) &d {addsi3}
Creating newreg=728 from oldreg=619, assigning class LD_REGS to r728
IMHO: the bug is in lra-constraints.cc in function `get_reload_reg'
fragment of `get_reload_reg':
if (type == OP_OUT)
{
/* Output reload registers tend to start out with a conservative
choice of register class. Usually this is ALL_REGS, although
a target might narrow it (for performance reasons) through
targetm.preferred_reload_class. It's therefore quite common
for a reload instruction to require a more restrictive class
than the class that was originally assigned to the reload register.
In these situations, it's more efficient to refine the choice
of register class rather than create a second reload register.
This also helps to avoid cycling for registers that are only
used by reload instructions. */
if (REG_P (original)
&& (int) REGNO (original) >= new_regno_start
&& INSN_UID (curr_insn) >= new_insn_uid_start
__________________________________^^
&& in_class_p (original, rclass, &new_class, true))
{
unsigned int regno = REGNO (original);
if (lra_dump_file != NULL)
{
fprintf (lra_dump_file, " Reuse r%d for output ", regno);
dump_value_slim (lra_dump_file, original, 1);
}
This condition incorrectly limits register reuse to ONLY newly generated instructions.
i.e. LRA can reuse registers only from insns generated by himself.
IMHO:It's wrong.
Scratch registers generated by LRA also have to be reused.
The patch is very simple.
On x86_64, it bootstraps+regtests fine.
Exposing a variable in a module and referencing it in a submodule made
the compiler ICE, because the external variable was not sorted into the
correct module. In fact the module name was not set where the variable
got built.
gcc/fortran/ChangeLog:
PR fortran/80235
* trans-decl.cc (gfc_build_qualified_array): Make sure the array
is associated to the correct module and being marked as extern.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/add_sources/submodule_1_sub.f90: New test.
* gfortran.dg/coarray/submodule_1.f90: New test.
Richard Biener [Wed, 16 Oct 2024 09:37:31 +0000 (11:37 +0200)]
Fix gcc.dg/vect/vect-early-break_39.c FAIL with forced SLP
The testcases shows single-element interleaving of size three
being exempted from permutation lowering via heuristics
(see also PR116973). But it wasn't supposed to apply to
non-power-of-two sizes so this amends the check to ensure
the sub-group is aligned even when the number of lanes is one.
* tree-vect-slp.cc (vect_lower_load_permutations): Avoid
exempting non-power-of-two group sizes from lowering.
Jakub Jelinek [Thu, 17 Oct 2024 05:01:44 +0000 (07:01 +0200)]
c, libcpp: Partially implement C2Y N3353 paper [PR117028]
The following patch partially implements the N3353 paper.
In particular, it adds support for the delimited escape sequences
(\u{123}, \x{123}, \o{123}) which were added already for C++23,
all I had to do is split the delimited escape sequence guarding from
named universal character escape sequence guards
(\N{LATIN CAPITAL LETTER C WITH CARON}), which C++23 has but C2Y doesn't
and emit different diagnostics for C from C++ for the delimited escape
sequences.
And it adds support for the new style of octal literals, 0o137 or 0O1777.
I have so far added that just for C and not C++, because I have no idea
whether C++ will want to handle it similarly.
What the patch doesn't do is any kind of diagnostics for obsoletion of
\137 or 0137, as discussed in the PR, I think it is way too early for that.
Perhaps some non-default warning later on.
2024-10-17 Jakub Jelinek <jakub@redhat.com>
PR c/117028
libcpp/
* include/cpplib.h (struct cpp_options): Add named_uc_escape_seqs,
octal_constants and cpp_warn_c23_c2y_compat members.
(enum cpp_warning_reason): Add CPP_W_C23_C2Y_COMPAT enumerator.
* init.cc (struct lang_flags): Add named_uc_escape_seqs and
octal_constants bit-fields.
(lang_defaults): Add initializers for them into the table.
(cpp_set_lang): Initialize named_uc_escape_seqs and octal_constants.
(cpp_create_reader): Initialize cpp_warn_c23_c2y_compat to -1.
* charset.cc (_cpp_valid_ucn): Test
CPP_OPTION (pfile, named_uc_escape_seqs) rather than
CPP_OPTION (pfile, delimited_escape_seqs) in \N{} related tests.
Change wording of C cpp_pedwarning for \u{} and emit
-Wc23-c2y-compat warning for it too if needed. Formatting fixes.
(convert_hex): Change wording of C cpp_pedwarning for \u{} and emit
-Wc23-c2y-compat warning for it too if needed.
(convert_oct): Likewise.
* expr.cc (cpp_classify_number): Handle C2Y 0o or 0O prefixed
octal constants.
(cpp_interpret_integer): Likewise.
gcc/c-family/
* c.opt (Wc23-c2y-compat): Add CPP and CppReason parameters.
* c-opts.cc (set_std_c2y): Use CLK_STDC2Y or CLK_GNUC2Y rather
than CLK_STDC23 and CLK_GNUC23. Formatting fix.
* c-lex.cc (interpret_integer): Handle C2Y 0o or 0O prefixed
and wb/WB/uwb/UWB suffixed octal constants.
gcc/testsuite/
* gcc.dg/bitint-112.c: New test.
* gcc.dg/c23-digit-separators-1.c: Add _Static_assert for
valid binary constant with digit separator.
* gcc.dg/c23-octal-constants-1.c: New test.
* gcc.dg/c23-octal-constants-2.c: New test.
* gcc.dg/c2y-digit-separators-1.c: New test.
* gcc.dg/c2y-digit-separators-2.c: New test.
* gcc.dg/c2y-octal-constants-1.c: New test.
* gcc.dg/c2y-octal-constants-2.c: New test.
* gcc.dg/c2y-octal-constants-3.c: New test.
* gcc.dg/cpp/c23-delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/c23-delimited-escape-seq-2.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-1.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-2.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-3.c: New test.
* gcc.dg/cpp/c2y-delimited-escape-seq-4.c: New test.
* gcc.dg/octal-constants-1.c: New test.
* gcc.dg/octal-constants-2.c: New test.
* gcc.dg/octal-constants-3.c: New test.
* gcc.dg/octal-constants-4.c: New test.
* gcc.dg/system-octal-constants-1.c: New test.
* gcc.dg/system-octal-constants-1.h: New file.
Jakub Jelinek [Thu, 17 Oct 2024 04:59:31 +0000 (06:59 +0200)]
c: Fix up speed up compilation of large char array initializers when not using #embed [PR117177]
Apparently my
c: Speed up compilation of large char array initializers when not using #embed
patch broke building glibc.
The issue is that when using CPP_EMBED, we are guaranteed by the
preprocessor that there is CPP_NUMBER CPP_COMMA before it and
CPP_COMMA CPP_NUMBER after it (or CPP_COMMA CPP_EMBED), so RAW_DATA_CST
never ends up at the end of arrays of unknown length.
Now, the c_parser_initval optimization attempted to preserve that property
rather than changing everything that e.g. inferes array number of elements
from the initializer etc. to deal with RAW_DATA_CST at the end, but
it didn't take into account the possibility that there could be
CPP_COMMA followed by CPP_CLOSE_BRACE (where the CPP_COMMA is redundant).
As we are peaking already at 4 tokens in that code, peeking more would
require using raw tokens and that seems to be expensive doing it for
every pair of tokens due to vec_free done when we are out of raw tokens.
So, the following patch instead determines the case where we want
another INTEGER_CST element after it after consuming the tokens, and just
arranges for another process_init_element.
2024-10-17 Jakub Jelinek <jakub@redhat.com>
PR c/117177
gcc/c/
* c-parser.cc (c_parser_initval): Instead of doing
orig_len == INT_MAX checks before consuming tokens to set
last = 1, check it after consuming it and if not followed
by CPP_COMMA CPP_NUMBER, call process_init_element once
more with the last CPP_NUMBER.
gcc/testsuite/
* c-c++-common/init-4.c: New test.
liuhongt [Tue, 8 Oct 2024 08:18:31 +0000 (16:18 +0800)]
Don't lower vpcmpu to pcmpgt since the latter is for signed comparison.
r15-1737-gb06a108f0fbffe lower AVX512 kmask comparison to AVX2 ones,
but wrong lowered unsigned comparison to signed ones, for unsigned
comparison, only EQ/NEQ can be lowered.
The commit fix that.
gcc/ChangeLog:
PR target/116940
* config/i386/sse.md (*avx2_pcmp<mode>3_7): Change
UNSPEC_PCMP_ITER to UNSPEC_PCMP.
(*avx2_pcmp<mode>3_8): New pre_reload
define_insn_and_splitter.
For masked FMA, there're 2 forms of RTL representation
1) (vec_merge (fma: op2 op1 op3) op1) mask)
2) (vec_merge (fma: op1 op2 op3) op1) mask)
It's because op1 op2 are communatative in RTL(the second op1 is
written as (match_dup 1))
we once tried to replace (match_dup 1)
with (match_operand:VFH_AVX512VL 5 "nonimmediate_operand" "0,0")), but
trigger an ICE in reload(reload can handle at most one operand with
"0" constraint).
So the patch do the canonicalizaton for the backend part.
Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
we once tried to replace it with (match_operand:M 5
"nonimmediate_operand" "0")) to enable more flexibility for pattern
match and recog, but it triggered an ICE in reload(reload can handle
at most one perand with "0" constraint).
So we need either add 2 patterns in the backend or just do the
canonicalization in the middle-end.
Cui, Lili [Thu, 17 Oct 2024 00:50:38 +0000 (08:50 +0800)]
Support andn_optab for x86
Add new andn pattern to match the new optab added by r15-1890-gf379596e0ba99d. Only enable 64bit, 128bit and
256bit vector ANDN, X86-64 has mask mov instruction when
avx512 is enabled.
This patch makes the non-predicated vdupq_n MVE intrinsics use
vec_duplicate rather than an unspec. This enables the compiler to
generate better code sequences (for instance using vmov when
possible).
The patch renames the existing mve_vdup<mode> pattern into
@mve_vdupq_n<mode>, and removes the now useless
@mve_<mve_insn>q_n_f<mode> and @mve_<mve_insn>q_n_<supf><mode> ones.
As a side-effect, it needs to update the mve_unpredicated_insn
predicates in @mve_<mve_insn>q_m_n_<supf><mode> and
@mve_<mve_insn>q_m_n_f<mode>.
Using vec_duplicates means the compiler is now able to use vmov in the
tests with an immediate argument in vdupq_n_[su]{8,16,32}.c:
vmov.i8 q0,#0x1
However, this is only possible when the immediate has a suitable value
(MVE encoding constraints, see imm_for_neon_mov_operand predicate).
Provided we adjust the cost computations in arm_rtx_costs_internal(),
when the immediate does not meet the vmov constraints, we now generate:
mov r0, #imm
vdup.xx q0,r0
or
ldr r0, .L4
vdup.32 q0,r0
in the f32 case (with 1.1 as immediate).
Without the cost adjustment, we would generate:
vldr.64 d0, .L4
vldr.64 d1, .L4+8
and an associated literal pool entry.
Regarding the testsuite updates:
--------------------------------
* The signed versions of vdupq_* tests lack a version with an
immediate argument. This patch adds them, similar to what we already
have for vdupq_n_u*.c tests.
* Code generation for different immediate values is checked with the
new tests this patch introduces. Note there's no need for s8/u8 tests
because 8-bit immediates always comply wth imm_for_neon_mov_operand.
* We can remove xfail from vcmp*f tests since we now generate:
movw r3, #15462
vcmp.f16 eq, q0, r3
instead of the previous:
vldr.64 d6, .L5
vldr.64 d7, .L5+8
vcmp.f16 eq, q0, q3
Tested on arm-linux-gnueabihf and arm-none-eabi with no regression.
2024-07-02 Jolen Li <jolen.li@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (vdupq_impl): New class.
(vdupq): Use new implementation.
* config/arm/arm.cc (arm_rtx_costs_internal): Handle HFmode
for COST_DOUBLE. Update costing for CONST_VECTOR.
* config/arm/arm_mve_builtins.def: Merge vdupq_n_f, vdupq_n_s
and vdupq_n_u into vdupq_n.
* config/arm/mve.md (mve_vdup<mode>): Rename into ...
(@mve_vdup_n<mode>): ... this.
(@mve_<mve_insn>q_n_f<mode>): Delete.
(@mve_<mve_insn>q_n_<supf><mode>): Delete..
(@mve_<mve_insn>q_m_n_<supf><mode>): Update mve_unpredicated_insn
attribute.
(@mve_<mve_insn>q_m_n_f<mode>): Likewise.
This patch fixes a bug where the mode iterator for mve_vdup<mode>
should be MVE_VLD_ST instead of MVE_vecs: V2DI and V2DF (thus vdup.64)
are not supported by MVE.
2024-07-02 Jolen Li <jolen.li@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
David Malcolm [Wed, 16 Oct 2024 17:10:11 +0000 (13:10 -0400)]
diagnostics: capture backtraces in SARIF notifications [PR116602]
This patch makes the SARIF output's crash handler attempt to capture
a backtrace in JSON form within the notification's property bag. The
precise format of the property is subject to change, but, for example,
in one of the test cases I got output like this:
The backtrace code is based on that in diagnostic.cc.
gcc/ChangeLog:
PR other/116602
* diagnostic-format-sarif.cc: Include "demangle.h" and
"backtrace.h".
(sarif_invocation::add_notification_for_ice): Add "backtrace"
param and pass it to ctor.
(sarif_ice_notification::sarif_ice_notification): Add "backtrace"
param and add it to property bag.
(bt_stop): New, taken from diagnostic.cc.
(struct bt_closure): New.
(bt_callback): New, adapted from diagnostic.cc.
(sarif_builder::make_stack_from_backtrace): New.
(sarif_builder::on_report_diagnostic): Attempt to get backtrace
and pass it to add_notification_for_ice.
gcc/ChangeLog:
PR other/116613
* diagnostic-format-sarif.cc
(sarif_builder::on_report_diagnostic): Move the fnotice here from
sarif_ice_handler.
(sarif_ice_handler): Delete.
(diagnostic_output_format_init_sarif): Drop setting of ice handler
callback.
* diagnostic.cc (diagnostic_context::initialize): Likewise.
(diagnostic_context::action_after_output): Rather than call
m_ice_handler_cb, instead call finish on this context.
* diagnostic.h (ice_handler_callback_t): Delete typedef.
(diagnostic_context::set_ice_handler_callback): Delete.
(diagnostic_context::m_ice_handler_cb): Delete.
gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c: Update for
removal of ICE callback.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Joseph Myers [Wed, 16 Oct 2024 16:46:49 +0000 (16:46 +0000)]
testsuite: Prepare for -std=gnu23 default
Now that C23 support is essentially feature-complete, I'd like to
switch the default language version for C compilation to -std=gnu23.
This requires updating a large number of testcases that fail with the
new language version if left unchanged. In this patch, update most of
the tests for which there is a safe change that works both before and
after the update to default language version - typically adding the
option -std=gnu17 or -Wno-old-style-definition to the tests. (There
are also a few tests where I'd like to investigate further why they
fail with -std=gnu23, or where I think such failures show an actual
bug to fix before changing the default language version, or where it
seems more appropriate to make a testcase change that would result in
failures in the absence of the language version change rather than
just adding an option that does nothing with the gnu17 default.)
The libffi test fixes have also been submitted upstream:
<https://github.com/libffi/libffi/pull/861>.
Most of the failures requiring such changes are for one of two
reasons:
* Unprototyped function declarations with () (meaning the same as
(void) in C23 mode) for a function then called with arguments.
* Old-style function definitions, which warn by default in C23 mode,
so resulting in test failures for the unexpected warnings.
Other reasons for failures include:
* Tests with their own definitions of bool, true and false.
* Tests of diagnostics (often with -pedantic) in cases where C23 has
changed semantics, such as:
- tag compatibility for structs;
- enum values out of range of int;
- handing of qualified array types;
- decimal floating types formerly needing -pedantic diagnostics, but
being standard in C23.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
libffi/
* testsuite/libffi.call/va_struct2.c (test_fn): Cast n to void.
* testsuite/libffi.call/va_struct3.c (test_fn): Likewise.
Backported from <https://github.com/libffi/libffi/pull/861>.
Jakub Jelinek [Wed, 16 Oct 2024 15:46:06 +0000 (17:46 +0200)]
c: Add some checking asserts to named loops handling code
Jonathan mentioned an unnamed static analyzer reported issue in
c_finish_bc_name.
It is actually a false positive, because the construction of the
loop_names vector guarantees that the last element of the vector
(if the vector is non-empty) always has either
C_DECL_LOOP_NAME (l) or C_DECL_SWITCH_NAME (l) (or both) flags
set, so c will be always non-NULL after the if at the start of the
loops.
The following patch is an attempt to help those static analyzers
(though dunno if it actually helps), by adding a checking assert.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
* c-decl.cc (c_get_loop_names): Add checking assert that
c is non-NULL in the loop.
(c_finish_bc_name): Likewise.
Jakub Jelinek [Wed, 16 Oct 2024 15:45:19 +0000 (17:45 +0200)]
c: Fix up uninitialized next.original_type use in #embed optimization
Jonathan pointed me at a diagnostic from an unnamed static analyzer
which found that next.original_type isn't initialized for the CPP_EMBED
case when it is parsed in a comma expression, yet
expr.original_type = next.original_type;
is done a few lines later and the expr is returned.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
* c-parser.cc (c_parser_expression): Initialize next.original_type
to integer_type_node for the CPP_EMBED case.
Tobias Burnus [Wed, 16 Oct 2024 14:15:40 +0000 (16:15 +0200)]
Add libgomp.oacc-fortran/acc_on_device-1-4.f
Kind of undoes r15-4315-g9f549d216c9716 by adding the original testcase back;
namely, adding acc_on_device-1-3.f as acc_on_device-1-4.f with
-fno-builtin-acc_on_device removed.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/acc_on_device-1-4.f: New test;
same as acc_on_device-1-3.f but using the builtin function.
Jakub Jelinek [Wed, 16 Oct 2024 12:44:32 +0000 (14:44 +0200)]
Ternary operator formatting fixes
While working on PR117028 C2Y changes, I've noticed weird ternary
operator formatting (operand1 ? operand2: operand3).
The usual formatting is operand1 ? operand2 : operand3
where we have around 18000+ cases of that (counting only what fits
on one line) and
indent -nbad -bap -nbc -bbo -bl -bli2 -bls -ncdb -nce -cp1 -cs -di2 -ndj \
-nfc1 -nfca -hnl -i2 -ip5 -lp -pcs -psl -nsc -nsob
documented in
https://www.gnu.org/prep/standards/html_node/Formatting.html#Formatting
does the same.
Some code was even trying to save space as much as possible and used
operand1?operand2:operand3 or
operand1 ? operand2:operand3
Today I've grepped for such cases (the grep was '?.*[^ ]:' and I had to
skim through various false positives with that where the : matched e.g.
stuff inside of strings, or *.md pattern macros or :: scope) and the
following patch is a fix for what I found.
Richard Biener [Wed, 16 Oct 2024 08:09:36 +0000 (10:09 +0200)]
Enhance gather fallback for PR65518 with SLP
With SLP forced we fail to use gather for PR65518 on RISC-V as expected
because we're failing due to not effective peeling for gaps. The
following appropriately moves the memory_access_type adjustment before
doing all the overrun checking since using VMAT_ELEMENTWISE means
there's no overrun.
* tree-vect-stmts.cc (get_group_load_store_type): Move
VMAT_ELEMENTWISE fallback for single-element interleaving
of too large groups before overrun checking.
Richard Biener [Wed, 9 Oct 2024 12:38:48 +0000 (14:38 +0200)]
Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision
The following prepares us for SLP instances with a non-uniform number
of lanes. We already have this with load permutation lowering, but
we managed to keep that within the constraints of the per SLP instance
computed VF based on its max_nunits (with a vector type fixed for
each node) and the instance group size which is the number of lanes
in the SLP instance root. But in the case where arbitrary splitting
and merging SLP nodes at non-power-of-two lane boundaries is allowed
this simple calculation based on the outgoing group size falls apart.
The following, instead of computing a VF during SLP instance
discovery, computes it at vect_make_slp_decision time by walking
the SLP graph and looking at each SLP node in isolation. We do
track max_nunits per node which could be a VF per node instead or
forgo with both completely (though for BB vectorization we need
to communicate a VF > 1 requirement upward, or compute that after
the fact). In the end we'd like to delay vector type assignment
and only compute a minimum VF here, allowing vector types to
grow when the actual VF is bigger.
There's slight complication with permutes of externs / constants
as those get their vector type (and thus max_nunits) assigned late.
While we force them to have the same vector type as the result at
the moment their number of lanes can differ. So those get handled
explicitly there right now to up the VF as needed - the alternative
is to fail vectorization, I have an addition to
vect_maybe_update_slp_op_vectype that would FAIL if the set
vector type isn't within the constraints of the VF.
* tree-vectorizer.h (SLP_INSTANCE_UNROLLING_FACTOR): Remove.
(slp_instance::unrolling_factor): Likewise.
* tree-vect-slp.cc (vect_build_slp_instance): Do not set
SLP_INSTANCE_UNROLLING_FACTOR. Remove then dead code.
Compute and set max_nunits from the RHS nodes merged.
(vect_update_slp_vf_for_node): New function.
(vect_make_slp_decision): Use vect_update_slp_vf_for_node
to compute VF recursively.
(vect_build_slp_store_interleaving): Get max_nunits and
properly set that on the permute nodes built.
(vect_analyze_slp): Do not set SLP_INSTANCE_UNROLLING_FACTOR.
Jonathan Wakely [Wed, 16 Oct 2024 08:22:37 +0000 (09:22 +0100)]
libstdc++: Fix Python deprecation warning in printers.py
python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed as positional argument
The Python docs say:
Deprecated since version 3.13: Passing count and flags as positional
arguments is deprecated. In future Python versions they will be
keyword-only parameters.
Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.
Jakub Jelinek [Wed, 16 Oct 2024 08:22:44 +0000 (10:22 +0200)]
c: Speed up compilation of large char array initializers when not using #embed
The following patch on attempts to speed up compilation of large char array
initializers when one doesn't use #embed in the source.
My testcase has been
unsigned char a[] = {
#embed "cc1gm2" limit (100000000)
};
and corresponding variant which has the middle line replaced with
dd if=cc1gm bs=100000000 count=1 | xxd -i
With embed 95.3MiB is really fast:
time ./cc1 -quiet -O2 -o test4a.s test4a.c
real 0m0.700s
user 0m0.576s
sys 0m0.123s
Without embed and without this patch it needs around 11GB of RAM and
time ./cc1 -quiet -O2 -o test4b.s test4b.c
real 2m47.230s
user 2m41.548s
sys 0m4.328s
Without embed and with this patch it needs around 3.5GB of RAM and
time ./cc1 -quiet -O2 -o test4b.s2 test4b.c
real 0m25.004s
user 0m23.655s
sys 0m1.308s
Not perfect (but one needs to parse all the numbers, libcpp also creates
strings which are pointed by CPP_NUMBER tokens (that can take up to 4 bytes
per byte), but still almost 7x speed improvement and 3x compile time memory.
One drawback of the patch is that for the larger initializers the precise
locations for -Wconversion warnings are gone when initializing signed char
(or char when it is signed) arrays.
If that is important, perhaps c_maybe_optimize_large_byte_initializer could
tell the caller this is the case and c_parser_initval could emit the
warnings directly when it still knows the location_t and suppress warnings
on the RAW_DATA_CST.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
* c-tree.h (c_maybe_optimize_large_byte_initializer): Declare.
* c-parser.cc (c_parser_initval): Attempt to optimize large char array
initializers into RAW_DATA_CST.
* c-typeck.cc (c_maybe_optimize_large_byte_initializer): New function.
* c-c++-common/init-1.c: New test.
* c-c++-common/init-2.c: New test.
* c-c++-common/init-3.c: New test.
Jakub Jelinek [Wed, 16 Oct 2024 08:20:00 +0000 (10:20 +0200)]
gimplify: Small RAW_DATA_CST gimplification fix
I've noticed the following testcase hangs during gimplification.
While it is gimplifying an assignment from a VAR_DECL .LCNNN to MEM_REF,
because the VAR_DECL is TREE_READONLY, it will happily pick its initializer
and try to gimplify that, which means recursing to the exact same code.
The following patch fixes that by just gimplifying the lhs and building
assignment, because the code decided that it should use copying from
a static var.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
* gimplify.cc (gimplify_init_ctor_eval): For larger RAW_DATA_CST,
just gimplify cref as lvalue and add gimple assignment of rctor
to cref instead of going through gimplification of INIT_EXPR, as
the latter can suffer from infinite recursion.
Jakub Jelinek [Wed, 16 Oct 2024 08:09:49 +0000 (10:09 +0200)]
libcpp, c, middle-end: Optimize initializers using #embed in C
This patch actually optimizes #embed, so far in C.
For a simple testcase (for 494447200 bytes long cc1plus):
cat embed-11.c
unsigned char a[] = {
#embed "cc1plus"
};
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
real 0m13.647s
user 0m7.157s
sys 0m2.597s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
real 0m28.649s
user 0m26.653s
sys 0m1.958s
and when configured against binutils with .base64 support
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c
real 0m4.283s
user 0m2.288s
sys 0m0.859s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c
real 0m6.888s
user 0m5.876s
sys 0m1.002s
(all times with --enable-checking=yes,rtl,extra compiler).
Even just
./cc1plus -E -o embed-11.i embed-11.c
(which doesn't have this optimization yet and so preprocesses it as
1.3GB preprocessed file) needed almost 25GB of compile time RAM (but
preprocessed fine).
And compiling that embed-11.i with -std=c23 -O0 by unpatched gcc
I gave up after 400 seconds when it already ate 45GB of RAM and didn't
produce a single byte into embed-11.s yet.
The patch introduces a new CPP_EMBED token which contains raw memory image
virtually representing a sequence of int literals.
To simplify the parsing complexities, the preprocessor guarantees CPP_EMBED
is only emitted if there are 4+ (it actually does that for 64+ right now)
literals in the sequence and emits CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA
CPP_NUMBER tokens (with more CPP_EMBED separated by CPP_COMMA if it is
longer than 2GB, as STRING_CSTs in GCC and also the new RAW_DATA_CST etc.
are limited to INT_MAX elements). The main reason is that the preprocessor
doesn't really know in which context #embed directive appears, there could
be e.g.
{ 25 *
#embed "whatever"
* 2 - 15 }
or similar and dealing with this special case deep in the expression parsing
is undesirable.
With the CPP_NUMBERs around it, I believe in the C FE the only places which
need handling of the CPP_EMBED token are initializer parsing (that is the
only one which adds actual optimizations for it), comma expressions (I
believe nothing really cares whether it is 25,13,95 or
25,13,0,1,2,3,4,5,6,7,8,9,10,13,95 etc., so besides the 2 outer CPP_NUMBER
the parsing just adds one INTEGER_CST to the comma expression, I doubt users
want to be spammed with millions of -Wunused warnings per #embed),
whatever uses c_parser_expr_list (function calls, attribute arguments,
OpenMP sizes clause argument, OpenACC tile clause argument and whatever uses
c_parser_get_builtin_args (mainly for __builtin_shufflevector). Please correct
me if I'm wrong.
The patch introduces a RAW_DATA_CST tree code, which can then be used inside
of array CONSTRUCTOR elt values. In some sense RAW_DATA_CST is similar to
STRING_CST, but right now STRING_CST is used only if the whole array
initializer is that constant, while RAW_DATA_CST at index idx (should be
always INTEGER_CST index, another advantage of the CPP_NUMBER around is that
[30 ... 250] =
#embed "whatever"
really does what it would do with a integer sequence there) stands for
[idx] = RAW_DATA_POINTER (val)[0],
[idx+1] = RAW_DATA_POINTER (val)[1],
...
[idx+RAW_DATA_LENGTH (val)-1] = RAW_DATA_POINTER (val)[RAW_DATA_LENGTH (val)-1].
Another important thing is that unlike STRING_CST which has the data
embedded in it RAW_DATA_CST doesn't own the data, it has RAW_DATA_OWNER
which owns the data (that can be a STRING_CST, e.g. used for PCH or LTO
after reading LTO in) or another RAW_DATA_CST (with NULL RAW_DATA_OWNER,
standing for data owned by libcpp buffers). The advantage is that it can be
cheaply peeled off, or split into multiple smaller pieces, e.g. if one uses
designated initializer to store something into the middle of a 10GB #embed
array, in no case we need to actually copy data around for that.
Right now RAW_DATA_CST is only used in initializers of integral arrays where
the integer type has (host) CHAR_BIT precision, so usually char/signed
char/unsigned char (for C++ later maybe std::byte); in theory we could say
allocate 4 times as big buffer for conversions to int array and depending
on endianity and storage order reversal etc., but I'm not sure if that is
something that will be actually needed in the wild.
And an optimization inside of c-common.cc attempts to undo that CPP_NUMBER
CPP_EMBED CPP_NUMBER division in case one uses #embed the usual way and
doesn't use the boundary literals in weird ways and the values there match
the surrounding bytes in the owner buffer.
For LTO, in order to avoid copying perhaps gigabytes long data around,
the hacks in the streamer out/in cause the data owned by libcpp to be
streamed right into the stream and streamed back as a STRING_CST which
owns the data.
2024-10-16 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (TTYPE_TABLE): Add CPP_EMBED token type.
* files.cc (finish_embed): For limit >= 64 and C preprocessing
instead of emitting CPP_NUMBER CPP_COMMA separated sequence for the
whole embed emit it just for the first and last byte and in between
emit a CPP_EMBED token or tokens if too large.
gcc/
* treestruct.def (TS_RAW_DATA_CST): New.
* tree.def (RAW_DATA_CST): New tree code.
* tree-core.h (struct tree_raw_data): New type.
(union tree_node): Add raw_data_cst member.
* tree.h (RAW_DATA_LENGTH, RAW_DATA_POINTER, RAW_DATA_OWNER): Define.
(gt_ggc_mx, gt_pch_nx): Declare overloads for tree_raw_data *.
* tree.cc (tree_node_structure_for_code): Handle RAW_DATA_CST.
(initialize_tree_contains_struct): Handle TS_RAW_DATA_CST.
(tree_code_size): Handle RAW_DATA_CST.
(initializer_zerop): Likewise.
(gt_ggc_mx, gt_pch_nx): Define overloads for tree_raw_data *.
* gimplify.cc (gimplify_init_ctor_eval): Handle RAW_DATA_CST.
* fold-const.cc (operand_compare::operand_equal_p): Handle
RAW_DATA_CST. Formatting fix.
(operand_compare::hash_operand): Handle RAW_DATA_CST.
(native_encode_initializer): Likewise.
(get_array_ctor_element_at_index): Likewise.
(fold): Likewise.
* gimple-fold.cc (fold_array_ctor_reference): Likewise. Formatting
fix.
* varasm.cc (const_hash_1): Handle RAW_DATA_CST.
(initializer_constant_valid_p_1): Likewise.
(array_size_for_constructor): Likewise.
(output_constructor_regular_field): Likewise.
* expr.cc (categorize_ctor_elements_1): Likewise.
(expand_expr_real_1) <case ARRAY_REF>: Punt for RAW_DATA_CST.
* tree-streamer.cc (streamer_check_handled_ts_structures): Mark
TS_RAW_DATA_CST as handled.
* tree-streamer-in.cc (streamer_alloc_tree): Handle RAW_DATA_CST.
(lto_input_ts_raw_data_cst_tree_pointers): New function.
(streamer_read_tree_body): Call it for RAW_DATA_CST.
* tree-streamer-out.cc (write_ts_raw_data_cst_tree_pointers): New
function.
(streamer_write_tree_body): Call it for RAW_DATA_CST.
(streamer_write_tree_header): Handle RAW_DATA_CST.
* lto-streamer-out.cc (DFS::DFS_write_tree_body): Handle RAW_DATA_CST.
* tree-pretty-print.cc (dump_generic_node): Likewise.
gcc/c-family/
* c-ppoutput.cc (token_streamer::stream): Add special code to spell
CPP_EMBED token.
* c-lex.cc (c_lex_with_flags): Handle CPP_EMBED. Formatting fix.
* c-common.cc (c_parse_error): Handle CPP_EMBED.
(braced_list_to_string): Optimize RAW_DATA_CST surrounded by
INTEGER_CSTs which match some bytes before or after RAW_DATA_CST in
its owner.
gcc/c/
* c-parser.cc (c_parser_braced_init): Handle CPP_EMBED.
(c_parser_get_builtin_args): Likewise.
(c_parser_expression): Likewise.
(c_parser_expr_list): Likewise.
* c-typeck.cc (digest_init): Handle RAW_DATA_CST. Formatting fix.
(init_node_successor): New function.
(add_pending_init): Handle RAW_DATA_CST.
(set_nonincremental_init): Formatting fix.
(output_init_element): Handle RAW_DATA_CST. Formatting fixes.
(maybe_split_raw_data): New function.
(process_init_element): Use maybe_split_raw_data. Handle
RAW_DATA_CST.
gcc/testsuite/
* c-c++-common/cpp/embed-20.c: New test.
* c-c++-common/cpp/embed-21.c: New test.
* c-c++-common/cpp/embed-28.c: New test.
* gcc.dg/cpp/embed-8.c: New test.
* gcc.dg/cpp/embed-9.c: New test.
* gcc.dg/cpp/embed-10.c: New test.
* gcc.dg/cpp/embed-11.c: New test.
* gcc.dg/cpp/embed-12.c: New test.
* gcc.dg/cpp/embed-13.c: New test.
* gcc.dg/cpp/embed-14.c: New test.
* gcc.dg/cpp/embed-15.c: New test.
* gcc.dg/cpp/embed-16.c: New test.
* gcc.dg/pch/embed-1.c: New test.
* gcc.dg/pch/embed-1.hs: New test.
* gcc.dg/lto/embed-1_0.c: New test.
* gcc.dg/lto/embed-1_1.c: New test.
Qing Zhao [Tue, 15 Oct 2024 17:55:22 +0000 (17:55 +0000)]
Provide new GCC builtin __builtin_counted_by_ref [PR116016]
With the addition of the 'counted_by' attribute and its wide roll-out
within the Linux kernel, a use case has been found that would be very
nice to have for object allocators: being able to set the counted_by
counter variable without knowing its name.
For example, given:
struct foo {
...
int counter;
...
struct bar array[] __attribute__((counted_by (counter)));
} *p;
The existing Linux object allocators are roughly:
#define MAX(A, B) (A > B) ? (A) : (B)
#define alloc(P, FAM, COUNT) ({ \
__auto_type __p = &(P); \
size_t __size = MAX (sizeof(*P),
__builtin_offsetof (__typeof(*P), FAM)
+ sizeof (*(P->FAM)) * COUNT); \
*__p = kmalloc(__size); \
})
Right now, any addition of a counted_by annotation must also
include an open-coded assignment of the counter variable after
the allocation:
p = alloc(p, array, how_many);
p->counter = how_many;
In order to avoid the tedious and error-prone work of manually adding
the open-coded counted-by intializations everywhere in the Linux
kernel, a new GCC builtin __builtin_counted_by_ref will be very useful
to be added to help the adoption of the counted-by attribute.
-- Built-in Function: TYPE __builtin_counted_by_ref (PTR)
The built-in function '__builtin_counted_by_ref' checks whether the
array object pointed by the pointer PTR has another object
associated with it that represents the number of elements in the
array object through the 'counted_by' attribute (i.e. the
counted-by object). If so, returns a pointer to the corresponding
counted-by object. If such counted-by object does not exist,
returns a null pointer.
This built-in function is only available in C for now.
The argument PTR must be a pointer to an array. The TYPE of the
returned value is a pointer type pointing to the corresponding
type of the counted-by object or a void pointer type in case of a
null pointer being returned.
With this new builtin, the central allocator could be updated to:
And then structs can gain the counted_by attribute without needing
additional open-coded counter assignments for each struct, and
unannotated structs could still use the same allocator.
* c-decl.cc (names_builtin_p): Add RID_BUILTIN_COUNTED_BY_REF.
* c-parser.cc (has_counted_by_object): New routine.
(get_counted_by_ref): New routine.
(c_parser_postfix_expression): Handle New RID_BUILTIN_COUNTED_BY_REF.
* c-tree.h: New routine handle_counted_by_for_component_ref.
* c-typeck.cc (handle_counted_by_for_component_ref): New routine.
(build_component_ref): Call the new routine.
gcc/ChangeLog:
* doc/extend.texi: Add documentation for __builtin_counted_by_ref.
gcc/testsuite/ChangeLog:
* gcc.dg/builtin-counted-by-ref-1.c: New test.
* gcc.dg/builtin-counted-by-ref.c: New test.
Jakub Jelinek [Tue, 15 Oct 2024 18:41:18 +0000 (20:41 +0200)]
c: Implement C2Y N3355 - Named Loops [PR117022]
The following patch implements the C2Y N3355 - Named Loops paper.
I've tried to implement it lazily, rather than proactively e.g. push
labels to a vector just in case the following statement is iteration
statement, switch statement or one of the loop pragmas followed by
iteration statement the patch just notes the last statement in
cur_stmt_list if any before c_parser_label/c_parser_all_labels and
passes it down to the iteration/switch statement parsing routines,
which then search backward for LABEL_EXPRs before they reach the given
stop statement.
The patch then adds one extra argument to
{FOR,WHILE,DO,BREAK,CONTINUE,SWITCH}_STMT, which is set to a canonical
name LABEL_DECL (the last named label before the construct).
If one just refers to the innermost construct with a fancy name,
it is in the end parsed the same as break/continue without an identifier
(i.e. NULL_TREE argument), and if a loop or switch has name(s) but
break/continue to that isn't used, the name is set to NULL_TREE.
At c-gimplify.cc time the name is then pushed into a hash map mapping
it to a pair of labels.
I've implemented it also for ObjC foreach loops (which have break/continue
handled during parsing, not during c-gimplify.cc).
As for OpenMP/OpenACC, the patch right now pretends no OpenMP loop
has a name, until something different is decided in the standard.
As shown in the testcases, most break identifier/continue identifier
cases aren't really useful in OpenMP code, a break identifier or
continue identifier jumping out of an OpenMP region is certainly invalid
(such regions have to be single entry single exit, so escaping it
through goto/break lab/continue lab violates that), similarly break
is disallowed in the innermost OpenMP nested loop, just continue
is allowed, so the only thing that would make sense for OpenMP (second
gomp testcase) would be allowing to give name to the innermost
loop in OpenMP canonical loop nest (except that labels aren't allowed
in the syntax right now in between the loops) and only continue to
that label. For collapse(1) loops that would be a label before
the #pragma or [[omp::directive (parallel for)]] etc. And of course,
what already works fine in the patch is break/continue to non-OpenMP loops
nested in OpenMP loops.
2024-10-12 Jakub Jelinek <jakub@redhat.com>
PR c/117022
gcc/c-family/
* c-common.def (FOR_STMT, WHILE_STMT, DO_STMT, BREAK_STMT,
CONTINUE_STMT, SWITCH_STMT): Add an extra operand, *_NAME
and document it.
* c-common.h (bc_hash_map_t): New typedef.
(struct bc_state): Add bc_hash_map member.
(WHILE_NAME, DO_NAME, FOR_NAME, BREAK_NAME, CONTINUE_NAME,
SWITCH_STMT_NAME): Define.
* c-pretty-print.cc (c_pretty_printer::statement): Print
BREAK_STMT or CONTINUE_STMT operand if any.
* c-gimplify.cc (bc_hash_map): New static variable.
(note_named_bc, release_named_bc): New functions.
(save_bc_state): Save and clear bc_hash_map.
(restore_bc_state): Assert NULL and restore bc_hash_map.
(genericize_c_loop): Add NAME argument, call note_named_bc
and release_named_bc if non-NULL around the body walk.
(genericize_for_stmt, genericize_while_stmt, genericize_do_stmt):
Adjust callers of it.
(genericize_switch_stmt): Rename break_block variable to blab.
Call note_named_bc and release_named_bc if SWITCH_STMT_NAME is
non-NULL around the body walk.
(genericize_continue_stmt): Handle non-NULL CONTINUE_NAME.
(genericize_break_stmt): Handle non-NULL BREAK_NAME.
(c_genericize): Delete and clear bc_hash_map.
gcc/c/
* c-tree.h: Implement C2Y N3355 - Named loops.
(C_DECL_LOOP_NAME, C_DECL_SWITCH_NAME, C_DECL_LOOP_SWITCH_NAME_VALID,
C_DECL_LOOP_SWITCH_NAME_USED, IN_NAMED_STMT): Define.
(c_get_loop_names, c_release_loop_names, c_finish_bc_name): Declare.
(c_start_switch): Add NAME argument.
(c_finish_bc_stmt): Likewise.
* c-lang.h (struct language_function): Add loop_names and
loop_names_hash members.
* c-parser.cc (c_parser_external_declaration,
c_parser_declaration_or_fndef, c_parser_struct_or_union_specifier,
c_parser_parameter_declaration): Adjust c_parser_pragma caller.
(get_before_labels): New function.
(c_parser_compound_statement_nostart): Call get_before_labels when
needed, adjust c_parser_pragma and c_parser_statement_after_labels
callers.
(c_parser_statement): Call get_before_labels first and pass it to
c_parser_statement_after_labels.
(c_parser_bc_name): New function.
(c_parser_statement_after_labels): Add BEFORE_LABELS argument. Pass
it down to c_parser_switch_statement, c_parser_while_statement,
c_parser_do_statement, c_parser_for_statement and c_parser_pragma.
Call c_parser_bc_name for RID_BREAK and RID_CONTINUE and pass it as
another argument to c_finish_bc_stmt.
(c_parser_if_body, c_parser_else_body): Call get_before_labels
early and pass it to c_parser_statement_after_labels.
(c_parser_switch_statement): Add BEFORE_LABELS argument. Call
c_get_loop_names, if named, pass switch_name to c_start_switch,
mark it valid and set IN_NAMED_STMT bit in in_statement before
parsing body, otherwise clear IN_NAMED_STMT bit before that parsing.
Run c_release_loop_names at the end.
(c_parser_while_statement, c_parser_do_statement,
c_parser_for_statement): Add BEFORE_LABELS argument. Call
c_get_loop_names, if named, mark it valid and set IN_NAMED_STMT bit
in in_statement before parsing body, otherwise clear IN_NAMED_STMT
before that parsing, arrange for the loop name if used to be
another *_STMT argument.
(c_parser_objc_class_instance_variables,
c_parser_objc_methodprotolist): Adjust c_parser_pragma callers.
(c_parser_pragma): Add BEFORE_LABELS argument. Pass it down to
c_parser_for_statement, c_parser_while_statement or
c_parser_do_statement.
(c_parser_omp_loop_nest, c_maybe_parse_omp_decl): Adjust
c_parser_pragma callers.
* c-decl.cc (loop_names, loop_names_hash): New static variables.
(add_stmt): Set STATEMENT_LIST_HAS_LABEL after push_stmt_list rather
than before it.
(c_push_function_context): Save and clear loop_names and
loop_names_hash.
(c_pop_function_context): Release or delete, restore and clear
loop_names and loop_names_hash.
(c_get_loop_names, c_release_loop_names, c_finish_bc_name): New
functions.
* c-typeck.cc (c_start_switch): Add SWITCH_NAME argument, pass it down
to build_stmt.
(c_finish_bc_stmt): Add NAME argument. Mark of IN_NAMED_STMT bit
of in_statement in swtiches. Use label for IN_OBJC_FOREACH only if
name is NULL. If name is non-NULL and C_DECL_LOOP_NAME and
C_DECL_SWITCH_NAME are both set, assume outer ObjC foreach and
dig labels from DECL_CHAIN of name. Pass NAME to build_stmt
otherwise.
gcc/cp/
* semantics.cc (begin_while_stmt, begin_do_stmt, begin_for_stmt,
finish_break_stmt, finish_continue_stmt, begin_switch_stmt): Pass
another NULL_TREE to build_stmt calls.
gcc/testsuite/
* gcc.dg/c23-named-loops-1.c: New test.
* gcc.dg/c23-named-loops-5.c: New test.
* gcc.dg/c2y-named-loops-1.c: New test.
* gcc.dg/c2y-named-loops-2.c: New test.
* gcc.dg/c2y-named-loops-4.c: New test.
* gcc.dg/c2y-named-loops-5.c: New test.
* gcc.dg/c2y-named-loops-6.c: New test.
* gcc.dg/c2y-named-loops-7.c: New test.
* gcc.dg/gnu99-named-loops-1.c: New test.
* gcc.dg/gnu99-named-loops-2.c: New test.
* gcc.dg/gnu99-named-loops-3.c: New test.
* gcc.dg/gnu99-named-loops-4.c: New test.
* gcc.dg/gnu2y-named-loops-3.c: New test.
* gcc.dg/gomp/named-loops-1.c: New test.
* gcc.dg/gomp/named-loops-2.c: New test.
* objc.dg/named-loops-1.m: New test.
Jakub Jelinek [Tue, 15 Oct 2024 17:38:46 +0000 (19:38 +0200)]
match.pd: Further fma negation fixes [PR116891]
On Mon, Oct 14, 2024 at 08:53:29AM +0200, Jakub Jelinek wrote:
> > PR middle-end/116891
> > * match.pd ((negate (IFN_FNMS@3 @0 @1 @2)) -> (IFN_FMA @0 @1 @2)):
> > Only enable for !HONOR_SIGN_DEPENDENT_ROUNDING.
>
> Guess it would be nice to have a testcase which FAILs without the patch and
> PASSes with it, but it can be added later.
I've added such a testcase now, and additionally found the fix only fixed
one of the 4 problematic similar cases.
Here is a patch which fixes the others too and adds the testcases.
fma-pr116891.c FAILed without your patch, FAILs with your patch too (but
only due to the bar/baz/qux checks) and PASSes with the patch.
Patrick Palka [Tue, 15 Oct 2024 17:23:17 +0000 (13:23 -0400)]
c++: unifying lvalue vs rvalue (non-forwarding) ref [PR116710]
When unifying two (non-forwarding) reference types, unify immediately
recurses into the referenced type without first comparing rvalueness.
(Note that at this point forwarding references and other reference
parameters have already been stripped to their referenced type by
maybe_adjust_types_for_deduction, so this code path applies only to
nested reference types.)
Patrick Palka [Tue, 15 Oct 2024 17:13:15 +0000 (13:13 -0400)]
c++: checking ICE w/ constexpr if and lambda as def targ [PR117054]
Here we're tripping over the assert in extract_locals_r which enforces
that an extra-args tree appearing inside another extra-args tree doesn't
actually have extra args. This invariant doesn't always hold for lambdas
(which recently gained the extra-args mechanism) but that should be
harmless since cp_walk_subtrees doesn't walk LAMBDA_EXPR_EXTRA_ARGS and
so should be immune to the PR114303 issue for now. So let's just disable
this assert for lambdas.
PR c++/117054
gcc/cp/ChangeLog:
* pt.cc (extract_locals_r): Disable tree_extra_args assert
for LAMBDA_EXPR.
jit.dg/test-error-pr63969-missing-driver.c tries to break PATH and
verify that an error is generated when using an external driver.
However it does this by unsetting PATH, and so the test could
accidentally find the driver if the system supplies a default and the
driver happens to be installed in that path (reported as rhbz#2318021).
Fix the test by instead setting PATH to a bogus value.
gcc/testsuite/ChangeLog:
* jit.dg/test-error-pr63969-missing-driver.c (create_code): When
breaking PATH, use setenv with a bogus value, rather than
unsetenv, in case the system uses a default path that contains
the driver binary.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Uros Bizjak [Tue, 15 Oct 2024 14:51:33 +0000 (16:51 +0200)]
i386: Fix expand_vector_set for VEC_MERGE/VEC_DUPLICATE RTX [PR117116]
Middle end can generate SYMBOL_REF RTX as a value "val" in the call
to expand_vector_set, but SYMBOL_REF RTX is not accepted in
<sse2p4_1>_pinsr<ssemodesuffix> insn pattern, generated via
VEC_MERGE/VEC_DUPLICATE RTX path.
Force the value into a register before VEC_MERGE/VEC_DUPLICATE RTX
is generated if it doesn't satisfy nonimmediate_operand predicate.
PR target/117116
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vector_set): Force "val"
into a register before VEC_MERGE/VEC_DUPLICATE RTX is generated
if it doesn't satisfy nonimmediate_operand predicate.
Also we have been verifying dominance information is correct and not needing to invalidate
since ssa branch was merged so this comment has been out of date even before it was merged in.
Andrew Pinski [Sun, 13 Oct 2024 22:57:41 +0000 (15:57 -0700)]
passes: Remove limit on the number of params
Having a limit of 2 params for NEXT_PASS was just done because I didn't think there was
a way to handle arbitrary number of params. But I found that we can handle this
via a static const variable array (constexpr so we know it is true or false at compile time)
and just loop over the array.
Note I keep around NEXT_PASS_WITH_ARG and NEXT_PASS macros instead of always using
NEXT_PASS_WITH_ARGS macro to make sure these cases get optimized for -O0 (stage1).
Tested INSERT_PASS_AFTER/INSERT_PASS_BEFORE manually by changing config/i386/i386-passes.def's
stv lines to have a 2nd argument and checked the resuling pass-instances.def to see the NEXT_PASS_WITH_ARGS
was correctly done.
changes from v1:
* v2: Handle INSERT_PASS_AFTER/INSERT_PASS_BEFORE too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* gen-pass-instances.awk: Remove the limit of the params.
* pass_manager.h (NEXT_PASS_WITH_ARG2): Rename to ...
(NEXT_PASS_WITH_ARGS): This.
* passes.cc (NEXT_PASS_WITH_ARG2): Rename to ...
(NEXT_PASS_WITH_ARGS): This and support more than 2 params by using
a constexpr array.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jonathan Wakely [Sun, 13 Oct 2024 21:28:16 +0000 (22:28 +0100)]
libstdc++: Implement LWG 3798 for range adaptors [PR106676]
LWG 3798 modified the iterator_category of the iterator types for
transform_view, join_with_view, zip_transform_view and
adjacent_transform_view, to allow the iterator's reference type to be an
rvalue reference.
libstdc++-v3/ChangeLog:
PR libstdc++/106676
* include/bits/iterator_concepts.h (__cpp17_fwd_iterator): Use
is_reference instead of is_value_reference.
rvalue references.
* include/std/ranges (transform_view:__iter_cat::_S_iter_cat):
Likewise.
(zip_transform_view::__iter_cat::_S_iter_cat): Likewise.
(adjacent_transform_view::__iter_cat::_S_iter_cat): Likewise.
(join_with_view::__iter_cat::_S_iter_cat): Likewise.
* testsuite/std/ranges/adaptors/transform.cc: Check
iterator_category when the transformation function returns an
rvalue reference type.
Richard Biener [Sun, 13 Oct 2024 10:44:04 +0000 (12:44 +0200)]
tree-optimization/116907 - stale BLOCK reference from DECL_VALUE_EXPR
When we remove unused BLOCKs we fail to clean references to them
from DECL_VALUE_EXPRs of variables in other BLOCKs which in the
PR causes LTO streaming to walk into pointers to GGC freed blocks.
There's the question of whether such DECL_VALUE_EXPRs should keep
variables and blocks referenced live (it doesn't seem to do that)
and whether such DECL_VALUE_EXPRs should have survived in the
first place.
PR tree-optimization/116907
* tree-ssa-live.cc (clear_unused_block_pointer_in_block): New
helper.
(clear_unused_block_pointer): Call it.
Tamar Christina [Tue, 15 Oct 2024 10:22:26 +0000 (11:22 +0100)]
AArch64: re-enable memory access costing after SLP change.
While chasing down a costing difference between SLP and non-SLP for memory
access costing I noticed that at some point the SLP and non-SLP costing have
diverged. It used to be we only supported LOAD_LANES in SLP and so the non-SLP
costing was working fine.
But with the change to SLP only we now lost costing.
It looks like the vectorizer for non-SLP stores the VMAT type in
STMT_VINFO_MEMORY_ACCESS_TYPE on the stmt_info, but for SLP it stores it in
SLP_TREE_MEMORY_ACCESS_TYPE which is on the SLP node itself.
While my first attempt of a patch was to just also store the VMAT in the
stmt_info https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665295.html
Richi pointed out that this goes wrong when the same access is used Hybrid.
And so we have to do a backend specific fix. To help out other backends this
also introduces a generic helper function suggested by Richi in that patch
(I hope that's ok.. I didn't want to split out just the helper.)
This successfully restores VMAT based costing in the new SLP only world.
gcc/ChangeLog:
* tree-vectorizer.h (vect_mem_access_type): New.
* config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use it.
(aarch64_detect_vector_stmt_subtype): Likewise.
(aarch64_adjust_stmt_cost): Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Make SLP node named.
Richard Biener [Tue, 15 Oct 2024 07:48:10 +0000 (09:48 +0200)]
middle-end/117137 - expansion issue with vector equality compares
When expanding a COND_EXPR with a vector equality compare as condition
expand_cond_expr_using_cmove fails to properly go the cbranch path.
I failed to massage it's twisted logic so the simple fix is to make
sure to expand a vector condition separately which also generates
the expected code for the testcase:
ptest %xmm0, %xmm0
cmovne %edi, %eax
PR middle-end/117137
* expr.cc (expand_cond_expr_using_cmove): Make sure to
expand vector comparisons separately.
Richard Biener [Tue, 15 Oct 2024 07:22:09 +0000 (09:22 +0200)]
tree-optimization/117147 - bogus re-use of previous ldst_p
The following shows that in vect_build_slp_tree_1 we're eventually
re-using the previous lane set ldst_p flag. Fixed by some
refactoring.
PR tree-optimization/117147
* tree-vect-slp.cc (vect_build_slp_tree_1): Put vars and
initialization of per-lane data into the per-lane processing
loop to avoid re-using previous lane state.
Jennifer Schmitz [Thu, 19 Sep 2024 10:18:05 +0000 (03:18 -0700)]
SVE intrinsics: Fold svmul with constant power-of-2 operand to svlsl
For svmul, if one of the operands is a constant vector with a uniform
power of 2, this patch folds the multiplication to a left-shift by
immediate (svlsl).
Because the shift amount in svlsl is the second operand, the order of the
operands is switched, if the first operand contained the powers of 2. However,
this switching is not valid for some predications: If the predication is
_m and the predicate not ptrue, the result of svlsl might not be the
same as for svmul. Therefore, we do not apply the fold in this case.
The transform is also not applied to constant vectors of 1 (this case is
partially covered by constant folding already and the missing cases will be
addressed by the follow-up patch suggested in
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html).
Tests were added in the existing test harness to check the produced assembly
- when the first or second operand contains the power of 2
- when the second operand is a vector or scalar (_n)
- for _m, _z, _x predication
- for _m with ptrue or non-ptrue
- for intmin for signed integer types
- for the maximum power of 2 for signed and unsigned integer types.
Note that we used 4 as a power of 2, instead of 2, because a recent
patch optimizes left-shifts by 1 to an add instruction. But since we
wanted to highlight the change to an lsl instruction we used a higher
power of 2.
To also check correctness, runtime tests were added.
The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
Implement fold to svlsl for power-of-2 operands.
Jakub Jelinek [Tue, 15 Oct 2024 05:53:56 +0000 (07:53 +0200)]
libcpp: Add -Wtrailing-blanks warning
Trailing blanks is something even git diff diagnoses; while it is a coding
style issue, if it is so common that git diff diagnoses it, I think it could
be useful to various projects to check that at compile time.
Dunno if it should be included in -Wextra, currently it isn't, and due to
tons of trailing whitespace in our sources, haven't enabled it for when
building gcc itself either.
Note, git diff also diagnoses indentation with tab following space, wonder
if we couldn't have trivial warning options where one would simply ask for
checking of indentation with no tabs, just spaces vs. indentation with
tabs followed by spaces (but never tab width or more spaces in the
indentation). I think that would be easy to do also on the libcpp side.
Checking how much something should be exactly indented requires syntax
analysis (at least some limited one) and can consider columns of first token
on line, but what the exact indentation blanks were is something only libcpp
knows.
On Thu, Sep 19, 2024 at 08:17:24AM +0200, Richard Biener wrote:
> Generally I like diagnosing this early. For the above I'd say -Wtrailing-whitespace=
> with a set of things to diagnose (and a sane default - just spaces and tabs - for
> -Wtrailiing-whitespace) would be nice. As for naming possibly follow the
> is{space,blank,cntrl} character classifications? If those are a good
> fit, that is.
The patch currently allows blank (' ' '\t') and space (' ' '\t' '\f' '\v'),
cntrl not yet added, not anything non-ASCII, but in theory could
be added later (though, non-ASCII would be just for inside of comments,
say non-breaking space etc. in the source is otherwise an error).
2024-10-15 Jakub Jelinek <jakub@redhat.com>
libcpp/
* include/cpplib.h (struct cpp_options): Add
cpp_warn_trailing_whitespace member.
(enum cpp_warning_reason): Add CPP_W_TRAILING_WHITESPACE.
* internal.h (struct _cpp_line_note): Document 'W' line note.
* lex.cc (_cpp_clean_line): Add 'W' line note for trailing whitespace
except for trailing whitespace after backslash. Formatting fix.
(_cpp_process_line_notes): Emit -Wtrailing-whitespace diagnostics.
Formatting fixes.
(lex_raw_string): Clear type on 'W' notes.
gcc/
* doc/invoke.texi (Wtrailing-whitespace): Document.
gcc/c-family/
* c.opt (Wtrailing-whitespace=): New option.
(Wtrailing-whitespace): New alias.
* c.opt.urls: Regenerate.
gcc/testsuite/
* c-c++-common/cpp/Wtrailing-whitespace-1.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-2.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-3.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-4.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-5.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-6.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-7.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-8.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-9.c: New test.
* c-c++-common/cpp/Wtrailing-whitespace-10.c: New test.
My recent changes to genmatch apparently broke bootstrap on FreeBSD
and Darwin and perhaps others, and also broke $build != $host
builds including canadian cross.
The change was to link in libcommon.a into build/genmatch, so that
we can use pp_format_verbatim. Unfortunately that has various
dependencies in libcommon.a, and more importantly, libcommon.a is
a host library, while build/genmatch carefully links with build/vec.o
etc., build version of libcpp.
So, in order to use pretty-print.o stuff, we'd need to build a build/
version of all those objects and worse ensure there is and we properly
link build version of libintl and/or libiconv when needed (those 2 are
the reasons for FreeBSD/Darwin failures).
The following patch just reverts those changes and instead adds a very
simple variant of gcc_diag style vfprintf, which prints the result
directly into a stream.
We don't need anything fancy, like UTF-8 quotes, colors, URLs, in the
usual case genmatch shouldn't print anything at all.
The patch implements what pretty-print.cc implements, except the fancy
stuff (no colors, no URLs printed, quotes always printed just as
'something', strings even in %qs printed normally rather than trying to
pass through ASCII and valid UTF-8 and use <80><35> style printing for the
rest) and except %@ and %e (neither libcpp nor genmatch.cc use those
currently and they need extra structures etc. which aren't used in libcpp
at all). It handles both "%.*s %d" and "%3$.*2$s %1$d" styles just in case
something got translated (although at least the cross-compiler and stage1
genmatch shouldn't be translating anything, but stage2+ native can).
I've tested it with hacking up most of pretty-print.cc self-tests
to just use warning_at ((location_t) 1, ...) and doing manual verification
of what was printed vs. what was expected (with a few additions for the
%M$ style formats); as it goes into a FILE * directly, I'm afraid self-tests
of this aren't easily possible.
2024-10-15 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/117110
* Makefile.in (generated_files, generated_match_files,
build/genmatch$(build_exeext), LINKER_FOR_BUILD): Revert
2024-10-12 changes.
* genmatch.cc: Don't include pretty-print.h and input.h.
(fatal, ggc_internal_cleared_alloc, ggc_free, line_table,
linemap_client_expand_location_to_spelling_point): Revert
2024-10-12 changes.
(DIAG_ARGMAX): Define.
(diag_integer_with_precision): Define.
(diag_vfprintf): New function.
(diagnostic_cb): Use diag_vfprintf instead of pp_format_verbatim.
(output_line_directive): Revert 2024-10-12 changes.
Pan Li [Tue, 15 Oct 2024 01:19:44 +0000 (09:19 +0800)]
RISC-V: Fix UNRESOLVED testcases for SAT alu vector mode
Some saturation related alu testcases missed additional option
for expand check, which result in some UNRESOLVED issues. This
patch would like to fix it by adding the option back as other
testcases.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
David Malcolm [Mon, 14 Oct 2024 23:22:46 +0000 (19:22 -0400)]
diagnostics: fix overload of emit_diagnostic [PR117109]
I accidentally broke "make gcc.pot" in r15-4081 by adding
a member function diagnostic_context::emit_diagnostic with a
gmsgid in a different position to the existing emit_diagnostic
functions, which exgettext's parser can't handle.
Jason Merrill [Tue, 8 Oct 2024 22:26:40 +0000 (18:26 -0400)]
libcpp: avoid extra spaces in module preprocessing
Within the compiler, module keywords "import", "module", and "export" that
are recognized as part of module directives gain an extra trailing space to
distinguish them from other non-keyword uses of those words in the code.
But when dumping preprocessed output, printing those spaces creates a
gratuitous inconsistency with non-modules preprocessing, as revealed by
several of the g++.dg/modules/cpp* tests if modules are enabled by default
in C++20 mode.
libcpp/ChangeLog:
* lex.cc (cpp_output_token): Omit terminal space from name.
gcc/testsuite/ChangeLog:
* g++.dg/modules/cpp-2_c.C: Expect only one space after import.
* g++.dg/modules/cpp-5_c.C
* g++.dg/modules/dep-2.C
* g++.dg/modules/dir-only-2_b.C
* g++.dg/modules/pr99050_b.C
* g++.dg/modules/inc-xlate-1_b.H
* g++.dg/modules/legacy-3_b.H
* g++.dg/modules/legacy-3_c.H: Likewise.
Jonathan Wakely [Sun, 13 Oct 2024 20:47:14 +0000 (21:47 +0100)]
libstdc++: Implement LWG 3564 for ranges::transform_view
The _Iterator<true> type returned by begin() const uses const F& to
transform the elements, so it should use const F& to determine the
iterator's value_type and iterator_category as well.
This was accepted into the WP in July 2022.
libstdc++-v3/ChangeLog:
* include/std/ranges (transform_view:_Iterator): Use const F&
to determine value_type and iterator_category of
_Iterator<true>, as per LWG 3564.
* testsuite/std/ranges/adaptors/transform.cc: Check value_type
and iterator_category.
Jason Merrill [Fri, 11 Oct 2024 18:52:43 +0000 (14:52 -0400)]
c++: address deduction and concepts [CWG2918]
CWG2918 changes deduction from an overload set for the case where multiple
candidates succeed and have the same type; previously this made the overload
set a non-deduced context, now it succeeds since the result is consistent
between the candidates.
This is needed for cases of overloading based on requirements, where we want
to choose the most constrained overload. I also needed to adjust
resolve_address_of_overloaded_function accordingly; we already handled the
comparison for template candidates in most_specialized_instantiation, but
need to also do the comparison for non-template candidates such as member
functions of a class template.
CWG 2918 (proposed)
gcc/cp/ChangeLog:
* cp-tree.h (most_constrained_function): Declare..
* class.cc (resolve_address_of_overloaded_function): Call it.
* pt.cc (get_template_for_ordering): Handle list from
resolve_address_of_overloaded_function.
(most_constrained_function): No longer static.
(resolve_overloaded_unification): Always compare type rather
than decl.
The test case 'libgomp.oacc-fortran/routine-nohost-1.f90' added in 2021
commit a61f6afbee370785cf091fe46e2e022748528307 "OpenACC 'nohost' clause" was
dependend on inlining being enabled, and otherwise ('-fno-inline') failed to
optimize/link:
/tmp/ccb2hsPd.o: In function `MAIN__._omp_fn.0':
routine-nohost-1.f90:(.text+0xf4): undefined reference to `fact_nohost_'
However, as of recent commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device",
we're now properly handling OpenACC/Fortran 'acc_on_device', and may specify
'-fno-inline', like done in 'libgomp.oacc-c-c++-common/routine-nohost-1.c'.
Jonathan Wakely [Tue, 24 Sep 2024 22:20:56 +0000 (23:20 +0100)]
libstdc++: Populate generic std::time_get's wide %c format [PR117135]
I missed out the __timepunct<wchar_t> specialization for the "generic"
implementation when defining the %c format in r15-4016-gc534e37faccf48.
libstdc++-v3/ChangeLog:
PR libstdc++/117135
* config/locale/generic/time_members.cc
(__timepunct<wchar_t>::_M_initialize_timepunc): Set
_M_date_time_format for C locale. Set %Ex formats to the same
values as the %x formats.
Andrew Pinski [Sun, 13 Oct 2024 18:40:39 +0000 (11:40 -0700)]
dce: Use a base common base class for pass_cd_dce and pass_dce
The classes pass_dce and pass_cd_dce share the same mechansim for their
params and almost the same execute functionality so let's create a new
base class which will be used for these two classes and move the common
code into the same one.
Note update_address_taken_p was updated to be a NSDMI instead of initializing
it explicitly in the constructor.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-dce.cc (tree_ssa_dce): Remove.
(tree_ssa_cd_dce): Remove.
(class pass_dce_base): New class.
(class pass_dce): Use pass_dce_base as the base class.
(class pass_cd_dce): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 13 Oct 2024 18:16:51 +0000 (11:16 -0700)]
dce: add remove_unused_locals conditionally to the todos [PR117096]
This is the updated patch with the suggestion from:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665217.html
Where we use a second arg/param to set which passes we want to have
the remove_unused_locals on the dce.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/117096
* passes.def: Update some of the dce/cd-cde passes setting
the 2nd arg to true.
Also remove comment about stdarg since dce does it.
* tree-ssa-dce.cc (pass_dce): Add remove_unused_locals_p field.
Update set_pass_param to allow for 2nd param.
Use remove_unused_locals_p in execute to return TODO_remove_unused_locals.
(pass_cd_dce): Likewise.
* tree-stdarg.cc (pass_data_stdarg): Remove TODO_remove_unused_locals.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sun, 13 Oct 2024 18:09:14 +0000 (11:09 -0700)]
passes: Allow for second param for NEXT_PASS
Right now we currently only support 1 parameter for each pass in NEXT_PASS.
We also don't error out if someone tries to use more than 1.
This adds support for more than one but only to a max of max_number_args
(which is currently 2).
In the next patch, this will be used for DCE, adding a new parameter.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* gen-pass-instances.awk (END): Handle processing
of multiple arguments to NEXT_PASS. Also error out
if using more than max_number_args (2).
* pass_manager.h (NEXT_PASS_WITH_ARG2): New define.
* passes.cc (NEXT_PASS_WITH_ARG2): New define.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
_Pragma("GCC system_header") currently takes effect only partially. It does
succeed in updating the line_map, so that checks like in_system_header_at()
return correctly, but it does not update pfile->buffer->sysp. One result is
that a subsequent #include does not set up the system header state properly
for the newly included file, as pointed out in the PR. Fix by propagating
the new system header state back to the buffer after processing the pragma.
libcpp/ChangeLog:
PR preprocessor/114436
* directives.cc (destringize_and_run): If the _Pragma changed the
buffer system header state (e.g. because it was "GCC
system_header"), propagate that change back to the actual buffer
too.
gcc/testsuite/ChangeLog:
PR preprocessor/114436
* c-c++-common/cpp/pragma-system-header-1.h: New test.
* c-c++-common/cpp/pragma-system-header-2.h: New test.
* c-c++-common/cpp/pragma-system-header.c: New test.
Lewis Hyatt [Fri, 12 Jan 2024 18:26:06 +0000 (13:26 -0500)]
libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704]
The implementation of #pragma push_macro and #pragma pop_macro has to date
made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an
identifier out of a string. When support was added for extended characters
in identifiers ($, UCNs, or UTF-8), that support was added only for the
"normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and
not for the ad-hoc way. Consequently, extended identifiers are not usable
with these pragmas.
The logic for lexing identifiers has become more complicated than it was
when _cpp_lex_identifier() was written -- it now handles things like \N{}
escapes in C++, for instance -- and it no longer seems practical to maintain
a redundant code path for lexing identifiers. Address the issue by changing
the implementation of #pragma {push,pop}_macro to lex identifiers in the
expected way, i.e. by pushing a cpp_buffer and lexing the identifier from
there.
The existing implementation has some quirks because of the ad-hoc parsing
logic. For example:
actually does sucessfully restore "X". This is because the key for looking
up the saved macro on the push stack is the original string passed, so the
string passed to pop_macro needs to match it exactly. It is not that easy to
reproduce this logic in the world of extended characters, given that for
example it should be valid to pass a UCN to push_macro, and the
corresponding UTF-8 to pop_macro. Given that this aspect of the existing
behavior seems unintentional and has no tests (and does not match other
implementations), I opted to make the new logic more straightforward. The
string passed needs to lex to one token, which must be a valid identifier,
or else no action is taken and no error is generated. Any diagnostics
encountered during lexing (e.g., due to a UTF-8 character not permitted to
appear in an identifier) are also suppressed.
It could be nice (for GCC 15) to also add a warning if a pop_macro does not
match a previous push_macro.
libcpp/ChangeLog:
PR preprocessor/109704
* include/cpplib.h (class cpp_auto_suppress_diagnostics): New class.
* errors.cc
(cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New
function.
(cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New
function.
* charset.cc (noop_diagnostic_cb): Remove.
(cpp_interpret_string_ranges): Refactor diagnostic suppression logic
into new class cpp_auto_suppress_diagnostics.
(count_source_chars): Likewise.
* directives.cc (cpp_pop_definition): Add cpp_hashnode argument.
(lex_identifier_from_string): New static helper function.
(push_pop_macro_common): Refactor common logic from
do_pragma_push_macro and do_pragma_pop_macro; use
lex_identifier_from_string instead of _cpp_lex_identifier.
(do_pragma_push_macro): Reimplement using push_pop_macro_common.
(do_pragma_pop_macro): Likewise.
* internal.h (_cpp_lex_identifier): Remove.
* lex.cc (lex_identifier_intern): Remove.
(_cpp_lex_identifier): Remove.
gcc/testsuite/ChangeLog:
PR preprocessor/109704
* c-c++-common/cpp/pragma-push-pop-utf8.c: New test.
* g++.dg/pch/pushpop-2.C: New test.
* g++.dg/pch/pushpop-2.Hs: New test.
* gcc.dg/pch/pushpop-2.c: New test.
* gcc.dg/pch/pushpop-2.hs: New test.
Allow for class type coarray parameters. [PR77871]
gcc/fortran/ChangeLog:
PR fortran/77871
* trans-expr.cc (gfc_conv_derived_to_class): Assign token when
converting a coarray to class.
(gfc_get_tree_for_caf_expr): For classes get the caf decl from
the saved descriptor.
(gfc_get_caf_token_offset):Assert that coarray=lib is set and
cover more cases where the tree having the coarray token can be.
* trans-intrinsic.cc (gfc_conv_intrinsic_caf_get): Use unified
test for pointers.
Tamar Christina [Mon, 14 Oct 2024 13:01:24 +0000 (14:01 +0100)]
middle-end: copy STMT_VINFO_STRIDED_P when DR is replaced [PR116956]
When move_dr copies a DR from one statement to another, it seems we've
forgotten to copy the STMT_VINFO_STRIDED_P flag. This leaves the new DR in a
broken state where it has a non constant stride but isn't marked as strided.
This causes the ICE in the PR because dataref analysis fails during epilogue
vectorization because there is an assumption in place that while costing may
fail for epiloque vectorization, that DR analysis cannot if it succeeded for
the main loop.
Tamar Christina [Mon, 14 Oct 2024 13:00:25 +0000 (14:00 +0100)]
simplify-rtx: Fix incorrect folding of shift and AND [PR117012]
The optimization added in r15-1047-g7876cde25cbd2f is using the wrong
operaiton to check for uniform constant vectors.
The Author intended to check that all the lanes in the vector are the same and
so used CONST_VECTOR_DUPLICATE_P. However this only checks that the vector
is created from a pattern duplication, but doesn't say how many pattern
alternatives make up the duplication. Normally would would need to check this
separately or use const_vec_duplicate_p.
Without this the optimization incorrectly triggers.
gcc/ChangeLog:
PR rtl-optimization/117012
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Use
const_vec_duplicate_p instead of CONST_VECTOR_DUPLICATE_P.
gcc/testsuite/ChangeLog:
PR rtl-optimization/117012
* gcc.target/aarch64/pr117012.c: New test.
Tamar Christina [Mon, 14 Oct 2024 10:58:59 +0000 (11:58 +0100)]
middle-end: support SLP early break
This patch introduces feature parity for early break int the SLP only
vectorizer.
The approach taken here is to treat the early exits as root statements for an
SLP tree. This means that we don't need any changes to build_slp to support
gconds.
Codegen for the gcond itself now has to be done out of line but the body of the
SLP blocks itself is simply driven by SLP scheduling. There is a slight
awkwardness in having re-used vectorizable_early_exit for both SLP and non-SLP
but I've documented the differences and when I did try to refactor it it wasn't
really worth it given that this is a temporary state anyway.
This version is restricted to lane = 1, as such we can re-use the existing
move_early_break function instead of having to do safety update through
scheduling. I have a branch where I'm working on that but lane > 1 is out of
scope for GCC 15 anyway. The only reason I will try to get moving through
scheduling done as a stretch goal is so we get epilogue vectorization back for
early break.
The example:
unsigned test4(unsigned x)
{
unsigned ret = 0;
for (int i = 0; i < N; i++)
{
vect_b[i] = x + i;
if (vect_a[i]*2 != x)
break;
vect_a[i] = x;
}
return ret;
}
builds the following SLP instance for early break:
note: Analyzing vectorizable control flow: if (patt_6 != 0)
note: Starting SLP discovery for
note: patt_6 = _4 != x_9(D);
note: starting SLP discovery for node 0x63abc80
note: Build SLP for patt_6 = _4 != x_9(D);
note: precomputed vectype: vector(4) <signed-boolean:32>
note: nunits = 4
note: vect_is_simple_use: operand x_9(D), type of def: external
note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, +INF] MASK 0xffff
_3 * 2, type of def: internal
note: starting SLP discovery for node 0x63abdc0
note: Build SLP for _4 = _3 * 2;
note: precomputed vectype: vector(4) unsigned int
note: nunits = 4
note: vect_is_simple_use: operand #
vect_aD.4416[i_15], type of def: internal
note: vect_is_simple_use: operand 2, type of def: constant
note: starting SLP discovery for node 0x63abe60
note: Build SLP for _3 = vect_a[i_15];
note: precomputed vectype: vector(4) unsigned int
note: nunits = 4
note: SLP discovery for node 0x63abe60 succeeded
note: SLP discovery for node 0x63abdc0 succeeded
note: SLP discovery for node 0x63abc80 succeeded
note: SLP size 3 vs. limit 10.
note: Final SLP tree for instance 0x6474190:
note: node 0x63abc80 (max_nunits=4, refcnt=2) vector(4) <signed-boolean:32>
note: op template: patt_6 = _4 != x_9(D);
note: stmt 0 patt_6 = _4 != x_9(D);
note: children 0x63abd20 0x63abdc0
note: node (external) 0x63abd20 (max_nunits=1, refcnt=1)
note: { x_9(D) }
note: node 0x63abdc0 (max_nunits=4, refcnt=2) vector(4) unsigned int
note: op template: _4 = _3 * 2;
note: stmt 0 _4 = _3 * 2;
note: children 0x63abe60 0x63abf00
note: node 0x63abe60 (max_nunits=4, refcnt=2) vector(4) unsigned int
note: op template: _3 = vect_a[i_15];
note: stmt 0 _3 = vect_a[i_15];
note: load permutation { 0 }
note: node (constant) 0x63abf00 (max_nunits=1, refcnt=1)
note: { 2 }
and during codegen:
note: ------>vectorizing SLP node starting from: patt_6 = _4 != x_9(D);
note: vect_is_simple_use: operand # RANGE [irange] unsigned int [0, 0][2, +INF] MASK 0xffff
_3 * 2, type of def: internal
note: add new stmt: mask_patt_6.18_58 = _53 != vect__4.17_57;
note: === vectorizable_early_exit ===
note: transform early-exit.
note: vectorizing stmts using SLP.
note: Vectorizing SLP tree:
note: node 0x63abfa0 (max_nunits=4, refcnt=1) vector(4) int
note: op template: i_12 = i_15 + 1;
note: stmt 0 i_12 = i_15 + 1;
note: children 0x63aba00 0x63ac040
note: node 0x63aba00 (max_nunits=4, refcnt=2) vector(4) int
note: op template: i_15 = PHI <i_12(6), 0(14)>
note: [l] stmt 0 i_15 = PHI <i_12(6), 0(14)>
note: children (nil) (nil)
note: node (constant) 0x63ac040 (max_nunits=1, refcnt=1) vector(4) int
note: { 1 }
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_2): Handle SLP trees with no
children.
* tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_gcond.
(LOOP_VINFO_EARLY_BREAKS_LIVE_IVS): New.
(vectorizable_early_exit): Expose.
(class _loop_vec_info): Add early_break_live_stmts.
* tree-vect-slp.cc (vect_build_slp_instance, vect_analyze_slp_instance):
Support gcond instances.
(vect_analyze_slp): Analyze gcond roots and early break live statements.
(maybe_push_to_hybrid_worklist): Don't sink gconds.
(vect_slp_analyze_operations): Support gconds.
(vect_slp_check_for_roots): Update comments.
(vectorize_slp_instance_root_stmt): Support gconds.
(vect_schedule_slp): Pass vinfo to vectorize_slp_instance_root_stmt.
* tree-vect-stmts.cc (vect_stmt_relevant_p): Record early break live
statements.
(vectorizable_early_exit): Support SLP.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-early-break_126.c: New test.
* gcc.dg/vect/vect-early-break_127.c: New test.
* gcc.dg/vect/vect-early-break_128.c: New test.
Jonathan Wakely [Sun, 13 Oct 2024 21:48:43 +0000 (22:48 +0100)]
libstdc++: Use std::move for iterator in ranges::fill [PR117094]
Input iterators aren't required to be copyable.
libstdc++-v3/ChangeLog:
PR libstdc++/117094
* include/bits/ranges_algobase.h (__fill_fn): Use std::move for
iterator that might not be copyable.
* testsuite/25_algorithms/fill/constrained.cc: Check
non-copyable iterator with sized sentinel.
Jonathan Wakely [Thu, 10 Oct 2024 12:36:33 +0000 (13:36 +0100)]
libstdc++: Enable memset optimizations for distinct character types [PR93059]
Currently we only optimize std::fill to memset when the source and
destination types are the same byte-sized type. This means that we fail
to optimize cases like std::fill(buf. buf+n, 0) because the literal 0 is
not the same type as the character buffer.
Such cases can safely be optimized to use memset, because assigning an
int (or other integer) to a narrow character type has the same effects
as converting the integer to unsigned char then copying it with memset.
This patch enables the optimized code path when the fill character is a
memcpy-able integer (using the new __memcpyable_integer trait). We still
need to check is_same<U, T> to enable the memset optimization for
filling a range of std::byte with a std::byte value, because that isn't
a memcpyable integer.
libstdc++-v3/ChangeLog:
PR libstdc++/93059
* include/bits/stl_algobase.h (__fill_a1(T*, T*, const T&)):
Change template parameters and enable_if condition to allow the
fill value to be an integer.
Jonathan Wakely [Thu, 10 Oct 2024 12:36:33 +0000 (13:36 +0100)]
libstdc++: Enable memcpy optimizations for distinct integral types [PR93059]
Currently we only optimize std::copy, std::copy_n etc. to memmove when
the source and destination types are the same. This means that we fail
to optimize copying between distinct 1-byte types, e.g. copying from a
buffer of unsigned char to a buffer of char8_t or vice versa.
This patch adds more partial specializations of the __memcpyable trait
so that we allow memcpy between integers of equal widths. This will
enable memmove for copies between narrow character types and also
between same-width types like int and unsigned.
Enabling the optimization needs to be based on the width of the integer
type, not just the size in bytes. This is because some targets define
non-standard integral types such as __int20 in msp430, which has padding
bits. It would not be safe to memcpy between e.g. __int20 and int32_t,
even though sizeof(__int20) == sizeof(int32_t). A new trait is
introduced to define the width, __memcpyable_integer, and then the
__memcpyable trait compares the widths.
It's safe to copy between signed and unsigned integers of the same
width, because GCC only supports two's complement integers.
I initially though it would be useful to define the specialization
__memcpyable_integer<byte> to enable copying between narrow character
types and std::byte. But that isn't possible with std::copy, because
is_assignable<char&, std::byte> is false. Optimized copies using memmove
will already happen for copying std::byte to std::byte, because
__memcpyable<T*, T*> is true.
libstdc++-v3/ChangeLog:
PR libstdc++/93059
* include/bits/cpp_type_traits.h (__memcpyable): Add partial
specialization for pointers to distinct types.
(__memcpyable_integer): New trait to control which types can use
cross-type memcpy optimizations.
Kito Cheng [Mon, 14 Oct 2024 08:07:16 +0000 (16:07 +0800)]
RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits
This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.
The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
- For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...
This API is intended to provide the capability to query minimal common available extensions on the system.
The API is defined in the riscv-c-api-doc:
https://github.com/riscv-non-isa/riscv-c-api-doc/blob/main/src/c-api.adoc
Proposal to use unsigned long long for marchid and mimpid:
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/91
Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.
Changes since v7:
- Remove vendorID field in __riscv_vendor_feature_bits.
- Fix C implies Zcf only for RV32.
- Add more comments to kernel versions.
Changes since v6:
- Implement __riscv_cpu_model.
- Set new sub extension bits which implied from previous extensions.
Changes since v5:
- Minor fixes on indentation.
Changes since v4:
- Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc*
Zimop, Zcmop, Zawrs.
- Rename the return variable name of hwprobe syscall.
- Minor fixes on indentation.
Changes since v3:
- Fix non-linux build.
- Let __init_riscv_feature_bits become constructor
Changes since v2:
- Prevent it initialize more than once.
Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
operations during the computation of the feature bit.
middle-end: [PR middle-end/116926] Allow widening optabs for vec-mode -> scalar-mode
The recent refactoring of the dot_prod optab to convert-type exposed a
limitation in how `find_widening_optab_handler_and_mode' is currently
implemented, owing to the fact that, while the function expects the
aarch64: Fix folding of degenerate svwhilele case [PR117045]
The svwhilele folder mishandled the degenerate case in which
the second argument is the maximum integer. In that case,
the result is all-true regardless of the first parameter:
If the second scalar operand is equal to the maximum signed integer
value then a condition which includes an equality test can never fail
and the result will be an all-true predicate.
This is because the conceptual "increment the first operand
by 1 after each element" is done modulo the range of the operand.
The GCC code was instead treating it as infinite precision.
whilele_5.c even had a test for the incorrect behaviour.
The easiest fix seemed to be to handle that case specially before
doing constant folding. This also copes with variable first operands.
gcc/
PR target/116999
PR target/117045
* config/aarch64/aarch64-sve-builtins-base.cc
(svwhilelx_impl::fold): Check for WHILELTs of the minimum value
and WHILELEs of the maximum value. Fold them to all-false and
all-true respectively.
Thomas Schwinge [Mon, 14 Oct 2024 08:34:34 +0000 (10:34 +0200)]
Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device: Harmonize 'libgomp.oacc-fortran/acc_on_device-1-*'
The test case 'libgomp.oacc-fortran/acc_on_device-1-1.f90' added in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
was missing '-fno-builtin-acc_on_device', and all
'libgomp.oacc-fortran/acc_on_device-1-*' need comments, why that option is
specified.
Thomas Schwinge [Mon, 14 Oct 2024 08:26:13 +0000 (10:26 +0200)]
Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device: Fix effective-target keyword in 'libgomp.oacc-fortran/acc_on_device-2.f90'
The test case 'libgomp.oacc-fortran/acc_on_device-2.f90' added in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
had a mismatch between dump file production and its scanning; the former needs
to use 'offload_target_nvptx' (like 'offload_target_amdgcn'), not
'offload_device_nvptx'.
Pan Li [Sat, 12 Oct 2024 03:08:21 +0000 (11:08 +0800)]
RISC-V: Add testcases for form 4 of vector signed SAT_SUB
Form 4:
#define DEF_VEC_SAT_S_SUB_FMT_4(T, UT, MIN, MAX) \
void __attribute__((noinline)) \
vec_sat_s_sub_##T##_fmt_4 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T minus; \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
out[i] = !overflow ? minus : x < 0 ? MIN : MAX; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-4-i8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-4-i8.c: New test.
Pan Li [Sat, 12 Oct 2024 02:40:30 +0000 (10:40 +0800)]
RISC-V: Add testcases for form 3 of vector signed SAT_SUB
Form 3:
#define DEF_VEC_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \
void __attribute__((noinline)) \
vec_sat_s_sub_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T minus; \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
out[i] = overflow ? x < 0 ? MIN : MAX : minus; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-3-i8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-3-i8.c: New test.
Pan Li [Sat, 12 Oct 2024 02:34:55 +0000 (10:34 +0800)]
Match: Support form 3 for vector signed integer SAT_SUB
This patch would like to support the form 3 of the vector signed
integer SAT_SUB. Aka below example:
Form 3:
#define DEF_VEC_SAT_S_SUB_FMT_3(T, UT, MIN, MAX) \
void __attribute__((noinline)) \
vec_sat_s_sub_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
T x = op_1[i]; \
T y = op_2[i]; \
T minus; \
bool overflow = __builtin_sub_overflow (x, y, &minus); \
out[i] = overflow ? x < 0 ? MIN : MAX : minus; \
} \
}
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-2-i8.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i16.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i32.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i64.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_s_sub-run-2-i8.c: New test.