Mark Harmstone [Mon, 28 Oct 2024 22:32:29 +0000 (22:32 +0000)]
Add block parameter to begin_block debug hook
Add a parameter to the begin_block debug hook that is a pointer to the
tree_node of the block in question. CodeView needs this as it records
line numbers of inlined functions in a different manner, so we need to
be able to tell if the block is actually the start of an inlined
function.
gcc/
* debug.cc (do_nothing_debug_hooks): Change begin_block
function pointer.
(debug_nothing_int_int_tree): New function.
* debug.h (struct gcc_debug_hooks): Add tree parameter to begin_block.
(debug_nothing_int_int_tree): Add declaration.
* dwarf2out.cc (dwarf2out_begin_block): Add tree parameter.
(dwarf2_lineno_debug_hooks): Use new dummy function for begin_block.
* final.cc (final_scan_insn_1): Pass insn block through to
debug_hooks->begin_block.
* vmsdbgout.cc (vmsdbgout_begin_block): Add tree parameter.
Georg-Johann Lay [Sat, 30 Nov 2024 13:58:05 +0000 (14:58 +0100)]
AVR: ad target/84211 - Split MOVW into MOVs in try_split_any.
When splitting multi-byte REG-REG moves in try_split_any(),
it's not clear whether propagating constants will turn
out as profitable. When MOVW is available, split into
REG-REG moves instead of a possible REG-CONST.
gcc/
PR target/84211
* config/avr/avr-passes.cc (try_split_any) [SET, MOVW]: Prefer
reg=reg move over reg=const when splitting a reg=reg insn.
Jakub Jelinek [Sat, 30 Nov 2024 10:30:08 +0000 (11:30 +0100)]
strlen: Handle vector CONSTRUCTORs [PR117057]
The following patch handles VECTOR_TYPE_P CONSTRUCTORs in
count_nonzero_bytes, including handling them if they have some elements
non-constant.
If there are still some constant elements before it (in the range queried),
we derive info at least from those bytes and consider the rest as unknown.
The first 3 hunks just punt in IMHO problematic cases, the spaghetti code
considers byte_size 0 as unknown size, determine yourself, so if offset
is equal to exp size, there are 0 bytes to consider (so nothing useful
to determine), but using byte_size 0 would mean use any size.
Similarly, native_encode_expr uses int type for offset (and size), so
padding it offset larger than INT_MAX could be silent miscompilation.
I've guarded the test to just a couple of targets known to handle it,
because e.g. on ia32 without -msse forwprop1 seems to lower the CONSTRUCTOR
into 4 BIT_FIELD_REF stores and I haven't figured out on what exactly
that depends on (e.g. powerpc* is fine on any CPUs, even with -mno-altivec
-mno-vsx, even -m32).
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117057
* tree-ssa-strlen.cc (strlen_pass::count_nonzero_bytes): Punt also
when byte_size is equal to offset or nchars. Punt if offset is bigger
than INT_MAX. Handle vector CONSTRUCTOR with some elements constant,
possibly followed by non-constant.
* gcc.dg/strlenopt-32.c: Remove xfail and vect_slp_v2qi_store_unalign
specific scan-tree-dump-times directive.
* gcc.dg/strlenopt-96.c: New test.
Jakub Jelinek [Sat, 30 Nov 2024 10:19:12 +0000 (11:19 +0100)]
openmp: Add crtoffloadtableS.o and use it [PR117851]
Unlike crtoffload{begin,end}.o which just define some symbols at the start/end
of the various .gnu.offload* sections, crtoffloadtable.o contains
const void *const __OFFLOAD_TABLE__[]
__attribute__ ((__visibility__ ("hidden"))) =
{
&__offload_func_table, &__offload_funcs_end,
&__offload_var_table, &__offload_vars_end,
&__offload_ind_func_table, &__offload_ind_funcs_end,
};
The problem is that linking this into PIEs or shared libraries doesn't
work when it is compiled without -fpic/-fpie - __OFFLOAD_TABLE__ for non-PIC
code is put into .rodata section, but it really needs relocations, so for
PIC it should go into .data.rel.ro/.data.rel.ro.local.
As I think we don't want .data.rel.ro section in non-PIE binaries, this patch
follows the path of e.g. crtbegin.o vs. crtbeginS.o and adds crtoffloadtableS.o
next to crtoffloadtable.o, where crtoffloadtableS.o is compiled with -fpic.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR libgomp/117851
gcc/
* lto-wrapper.cc (find_crtoffloadtable): Add PIE_OR_SHARED argument,
search for crtoffloadtableS.o rather than crtoffloadtable.o if
true.
(run_gcc): Add pie_or_shared variable. If OPT_pie or OPT_shared or
OPT_static_pie is seen, set pie_or_shared to true, if OPT_no_pie is
seen, set pie_or_shared to false. Pass it to find_crtoffloadtable.
libgcc/
* configure.ac (extra_parts): Add crtoffloadtableS.o.
* Makefile.in (crtoffloadtableS$(objext)): New goal.
* configure: Regenerated.
Jinyang He [Thu, 28 Nov 2024 01:26:25 +0000 (09:26 +0800)]
LoongArch: Mask shift offset when emit {xv, v}{srl, sll, sra} with sameimm vector
For {xv,v}{srl,sll,sra}, the constraint `vector_same_uimm6` cause overflow
in when emit {w,h,b}. Since the number of bits shifted is the remainder of
the register value, it is actually unnecessary to constrain the range.
Simply mask the shift number with the unit-bit-width, without any
constraint on the shift range.
gcc/ChangeLog:
* config/loongarch/constraints.md (Uuv6, Uuvx): Remove Uuv6,
add Uuvx as replicated vector const with unsigned range [0,umax].
* config/loongarch/lasx.md (xvsrl, xvsra, xvsll): Mask shift
offset by its unit bits.
* config/loongarch/lsx.md (vsrl, vsra, vsll): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_const_vector_same_int_p): Set default for low and high.
* config/loongarch/predicates.md: Replace reg_or_vector_same_uimm6
_operand to reg_or_vector_same_uimm_operand.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-shift-sameimm-vec.c: New test.
* gcc.target/loongarch/vector/lsx/lsx-shift-sameimm-vec.c: New test.
In r15-5327, change the default language version for C compilation from
-std=gnu17 to -std=gnu23.
ISO C99 and C11 allow ceil, floor, round and trunc, and their float and
long double variants, to raise the “inexact” exception,
but ISO/IEC TS 18661-1:2014, the C bindings to IEEE 754-2008, as
integrated into ISO C23, does not allow these functions to do so.
So add '-ffp-int-builtin-inexact' to this test case.
Andrew Pinski [Fri, 29 Nov 2024 23:29:41 +0000 (15:29 -0800)]
gimplefe: Error recovery for invalid declarations [PR117749]
c_parser_declarator can return null if there was an error,
but c_parser_gimple_declaration was not ready for that.
This fixes that oversight so we don't get an ICE after the error.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/117749
gcc/c/ChangeLog:
* gimple-parser.cc (c_parser_gimple_declaration): Check
declarator to be non-null.
gcc/testsuite/ChangeLog:
* gcc.dg/gimplefe-55.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jakub Jelinek [Sat, 30 Nov 2024 00:51:24 +0000 (01:51 +0100)]
ext-dce: Fix SIGN_EXTEND handling and cleanups [PR117360]
This is mostly a blind attempt to fix the PR + various cleanups.
The PR is about a shift of a HOST_WIDE_INT by 127 invoking UB.
Most of carry_backpropagate works on GET_MODE_INNER of the operand,
mode is assigned
enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
at the beginning and everything is done using that mode, so for
vector modes (or complex even?) we work with the element modes
rather than vector/complex modes.
But the SIGN_EXTEND handling does that inconsistently, it looks
at mode of the operand and uses GET_MODE_INNER in GET_MODE_MASK,
but doesn't use it in the shift.
The following patch appart from the cleanups fixes it by doing
essentially:
mode = GET_MODE (XEXP (x, 0));
if (mask & ~GET_MODE_MASK (GET_MODE_INNER (mode)))
- mask |= 1ULL << (GET_MODE_BITSIZE (mode).to_constant () - 1);
+ mask |= 1ULL << (GET_MODE_BITSIZE (GET_MODE_INNER (mode)).to_constant () - 1);
i.e. also shifting by GET_MODE_BITSIZE of the GET_MODE_INNER of the
operand's mode. We don't need to check if it is at most 64 bits,
at the start of the function we've already verified the result mode
is at most 64 bits and SIGN_EXTEND by definition extends from a narrower
mode.
The rest of the patch are cleanups. For HOST_WIDE_INT we have the
HOST_WIDE_INT_{UC,1U} macros, a HWI isn't necessarily unsigned long long,
so using ULL suffixes for it is weird.
More importantly, the function does
scalar_int_mode smode;
if (!is_a <scalar_int_mode> (mode, &smode)
|| GET_MODE_BITSIZE (smode) > HOST_BITS_PER_WIDE_INT)
return mmask;
early, so we don't need to use GET_MODE_BITSIZE (mode) which is
a poly_int but can use GET_MODE_BITSIZE (smode) with the same value
but in unsigned short, so we don't need to use known_lt or .to_constant ()
everywhere.
Plus some formatting issues.
What I've left around is
if (!GET_MODE_BITSIZE (GET_MODE (x)).is_constant ()
|| !GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0))).is_constant ())
return -1;
at the start of SIGN_EXTEND or ZERO_EXTEND, I'm afraid I don't know enough
about aarch64/riscv VL vectors to know why this is done (though even that
return -1; is weird, rest of the code does return mmask; if it wants to
punt.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/117360
* ext-dce.cc (ext_dce_process_sets): Use HOST_WIDE_INT_UC
macro instead of ULL suffixed constants.
(carry_backpropagate): Likewise. Use HOST_WIDE_INT_1U instead of
1ULL. Use GET_MODE_BITSIZE (smode) instead of
GET_MODE_BITSIZE (mode) and with that avoid having to use
known_lt instead of < or use .to_constant (). Formatting fixes.
(case SIGN_EXTEND): Set mode to GET_MODE_INNER (GET_MODE (XEXP (x, 0)))
rather than GET_MODE (XEXP (x, 0)) and don't use GET_MODE_INNER (mode).
(ext_dce_process_uses): Use HOST_WIDE_INT_UC macro instead of ULL
suffixed constants.
Jakub Jelinek [Sat, 30 Nov 2024 00:49:21 +0000 (01:49 +0100)]
c++: Implement C++26 P3176R1 - The Oxford variadic comma
While we are already in stage3, I wonder if implementing this small paper
wouldn't be useful even for GCC 15, so that we have in the GCC world one
extra year of deprecation of variadic ellipsis without preceding comma.
The paper just deprecates something, I'd hope most of the C++ code in the
wild when it uses variadic functions at all uses the comma before the
ellipsis.
2024-11-30 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c.opt (Wdeprecated-variadic-comma-omission): New option.
* c.opt.urls: Regenerate.
* c-opts.cc (c_common_post_options): Default to
-Wdeprecated-variadic-comma-omission for C++26 or -Wpedantic.
gcc/cp/
* parser.cc: Implement C++26 P3176R1 - The Oxford variadic comma.
(cp_parser_parameter_declaration_clause): Emit
-Wdeprecated-variadic-comma-omission warnings.
gcc/
* doc/invoke.texi (-Wdeprecated-variadic-comma-omission): Document.
gcc/testsuite/
* g++.dg/cpp26/variadic-comma1.C: New test.
* g++.dg/cpp26/variadic-comma2.C: New test.
* g++.dg/cpp26/variadic-comma3.C: New test.
* g++.dg/cpp26/variadic-comma4.C: New test.
* g++.dg/cpp26/variadic-comma5.C: New test.
* g++.dg/cpp1z/fold10.C: Expect a warning for C++26.
* g++.dg/ext/attrib33.C: Likewise.
* g++.dg/cpp1y/lambda-generic-variadic19.C: Likewise.
* g++.dg/cpp2a/lambda-generic10.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-const3.C: Likewise.
* g++.dg/cpp0x/variadic164.C: Likewise.
* g++.dg/cpp0x/variadic17.C: Likewise.
* g++.dg/cpp0x/udlit-args-neg.C: Likewise.
* g++.dg/cpp0x/variadic28.C: Likewise.
* g++.dg/cpp0x/gen-attrs-33.C: Likewise.
* g++.dg/cpp23/explicit-obj-diagnostics3.C: Likewise.
* g++.old-deja/g++.law/operators15.C: Likewise.
* g++.old-deja/g++.mike/p811.C: Likewise.
* g++.old-deja/g++.mike/p12306.C (printf): Add , before ... .
* g++.dg/analyzer/fd-bind-pr107783.C (bind): Likewise.
* g++.dg/cpp0x/vt-65790.C (printf): Likewise.
libstdc++-v3/
* include/std/functional (_Bind_check_arity): Add , before ... .
* include/bits/refwrap.h (_Mem_fn_traits, _Weak_result_type_impl):
Likewise.
* include/tr1/type_traits (is_function): Likewise.
Ian Lance Taylor [Thu, 28 Nov 2024 21:14:34 +0000 (13:14 -0800)]
compiler: increase buffer size to avoid warning
GCC has a new -Wformat-truncation warning that triggers on this code:
../../gcc/go/gofrontend/go-encode-id.cc: In function 'std::string go_encode_id(const std::string&)':
../../gcc/go/gofrontend/go-encode-id.cc:176:48: error: '%02x' directive output may be truncated writing between 2 and 8 bytes into a region of size 6 [-Werror=format-truncation=]
176 | snprintf(buf, sizeof buf, "_x%02x", c);
| ^~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:45: note: directive argument in the range [128, 4294967295]
176 | snprintf(buf, sizeof buf, "_x%02x", c);
| ^~~~~~~~
../../gcc/go/gofrontend/go-encode-id.cc:176:27: note: 'snprintf' output between 5 and 11 bytes into a destination of size 8
176 | snprintf(buf, sizeof buf, "_x%02x", c);
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The code is safe, because the value of c is known to be >= 0 && <= 0xff.
But it's difficult for the compiler to know that.
Bump the buffer size to avoid the warning.
Andrew Pinski [Thu, 21 Nov 2024 18:59:59 +0000 (10:59 -0800)]
aarch64: add attributes to the prefetch_builtins
This adds the attributes associated with prefetch to the bultins.
Just call aarch64_get_attributes with FLAG_PREFETCH_MEMORY to get the attributes.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_prefetch_builtin):
Updete call to aarch64_general_add_builtin in AARCH64_INIT_PREFETCH_BUILTIN.
Add new variable prefetch_attrs.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Thu, 21 Nov 2024 18:51:38 +0000 (10:51 -0800)]
aarch64: Fix up flags for vget_low_*, vget_high_* and vreinterpret intrinsics
These 3 intrinsics will not raise an fp exception, or read FPCR. These intrinsics,
will be folded into VIEW_CONVERT_EXPR or a BIT_FIELD_REF which is already set to
be const expressions too.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (VREINTERPRET_BUILTIN): Use
FLAG_NONE instead of FLAG_AUTO_FP.
(VGET_LOW_BUILTIN): Likewise.
(VGET_HIGH_BUILTIN): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Tue, 19 Nov 2024 08:19:57 +0000 (00:19 -0800)]
aarch64: Mark __builtin_aarch64_im_lane_boundsi as leaf and nothrow [PR117665]
__builtin_aarch64_im_lane_boundsi is known not to throw or call back into another
function since it will either folded into an NOP or will produce a compiler error.
This fixes the ICE by fixing the missed optimization. It does not fix the underlying
issue with fold_marked_statements; which I filed as PR 117668.
Built and tested for aarch64-linux-gnu.
PR target/117665
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_simd_builtin_functions):
Pass nothrow and leaf as attributes to aarch64_general_add_builtin for
__builtin_aarch64_im_lane_boundsi.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/lane-bound-1.C: New test.
* gcc.target/aarch64/lane-bound-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
[PR117770][LRA]: Check hard regs corresponding insn operands for hard reg clobbers
When LRA processes early clobbered hard regs explicitly present in the
insn description, it checks that the hard reg is also used as input.
If the hard reg is not an input also, it is marked as dying. For the
check LRA processed only input hard reg also explicitly present in the
insn description. For given PR, the hard reg is used as input as the
operand and is not present explicitly in the insn description and
therefore LRA marked the hard reg as dying. This results in wrong
allocation and wrong code. The patch solves the problem by processing
hard regs used as the insn operand.
gcc/ChangeLog:
PR rtl-optimization/117770
* lra-lives.cc: Include ira-int.h.
(process_bb_lives): Check hard regs corresponding insn operands
for dying hard wired reg clobbers.
Yury Khrustalev [Fri, 29 Nov 2024 11:09:23 +0000 (11:09 +0000)]
aarch64: Fix build failure due to missing header
Including the "arm_acle.h" header in aarch64-unwind.h requires
stdint.h to be present and it may not be available during the
first stage of cross-compilation of GCC.
When cross-building GCC for the aarch64-none-linux-gnu target
(on any supporting host) using the 3-stage bootstrap build
process when we build native compiler from source, libgcc fails
to compile due to missing header that has not been installed yet.
This could be worked around but it's better to fix the issue.
Andre Vieira [Fri, 29 Nov 2024 10:18:57 +0000 (10:18 +0000)]
arm, mve: Detect uses of vctp_vpr_generated inside subregs
Address a problem we were having where we were missing on detecting uses of
vctp_vpr_generated in the analysis for 'arm_attempt_dlstp_transform' because
the use was inside a SUBREG and rtx_equal_p does not catch that. Using
reg_overlap_mentioned_p is much more robust.
gcc/ChangeLog:
PR target/117814
* config/arm/arm.cc (arm_attempt_dlstp_transform): Use
reg_overlap_mentioned_p instead of rtx_equal_p to detect uses of
vctp_vpr_generated inside subregs.
gcc/testsuite/ChangeLog:
PR target/117814
* gcc.target/arm/mve/dlstp-invalid-asm.c (test10): Renamed to...
(test10a): ... this.
(test10b): Variation of test10a with a small change to trigger wrong
codegen.
Andre Vieira [Fri, 29 Nov 2024 10:14:14 +0000 (10:14 +0000)]
arm, mve: Pass -std=c99 to dlstp-loop-form.c to avoid new warning
This fixes a testism introduced by the warning produced with the -std=c23
default. The testcase is a reduced piece of code meant to trigger an ICE, so
there's little value in trying to change the code itself.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/dlstp-loop-form.c: Add -std=c99 to avoid warning
message.
Andre Vieira [Fri, 29 Nov 2024 09:59:25 +0000 (09:59 +0000)]
arm, mve: Fix scan-assembler for test7 in dlstp-compile-asm-2.c
After the changes to the vctp intrinsic codegen changed slightly, where we now
unfortunately seem to be generating unneeded moves and extends of the mask.
These are however not incorrect and we don't have a fix for the unneeded
codegen right now, so changing the testcase to accept them so we can catch
other changes if they occur.
gcc/testsuite/ChangeLog:
PR target/117814
* gcc.target/arm/mve/dlstp-compile-asm-2.c (test7): Add an optional
vmsr to the check-function-bodies.
If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation. Otherwise, if the target is ZBKB, generates table-based
CRC, but for reversing inputs and the output uses bswap and brev8
instructions. Add new tests to check CRC generation for ZBC, ZBKC and
ZBKB targets.
gcc/
* expr.cc (gf2n_poly_long_div_quotient): New function.
* expr.h (gf2n_poly_long_div_quotient): New function declaration.
* hwint.cc (reflect_hwi): New function.
* hwint.h (reflect_hwi): New function declaration.
* config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4): New
expander for reversed CRC.
(crc<SUBX1:mode><SUBX:mode>4): New expander for bit-forward CRC.
* config/riscv/iterators.md (SUBX1, ANYI1): New iterators.
* config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
New function declaration.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.cc (generate_reflecting_code_using_brev): New
function.
(expand_crc_using_clmul): Likewise.
(expand_reversed_crc_using_clmul): Likewise.
* config/riscv/riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV): New unspecs.
* doc/sourcebuild.texi: Document new target selectors.
* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccsu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccu_4x8x4.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_2x8x2.c: New test.
* gcc.target/riscv/rvv/xsfvector/sf_vqmaccus_4x8x4.c: New test.
Tamar Christina [Fri, 29 Nov 2024 13:01:11 +0000 (13:01 +0000)]
AArch64: Suppress default options when march or mcpu used is not affected by it.
This patch makes it so that when you use any of the Cortex-A53 errata
workarounds but have specified an -march or -mcpu we know is not affected by it
that we suppress the errata workaround.
This is a driver only patch as the linker invocation needs to be changed as
well. The linker and cc SPECs are different because for the linker we didn't
seem to add an inversion flag for the option. That said, it's also not possible
to configure the linker with it on by default. So not passing the flag is
sufficient to turn it off.
For the compilers however we have an inversion flag using -mno-, which is needed
to disable the workarounds when the compiler has been configured with it by
default.
In case it's unclear how the patch does what it does (it took me a while to
figure out the syntax):
* Early matching will replace any -march=native or -mcpu=native with their
expanded forms and erases the native arguments from the buffer.
* Due to the above if we ensure we handle the new code after this erasure then
we only have to handle the expanded form.
* The expanded form needs to handle -march=<arch>+extensions and
-mcpu=<cpu>+extensions and so we can't use normal string matching but
instead use strstr with a custom driver function that's common between
native and non-native builds.
* For the compilers we output -mno-<workaround> and for the linker we just
erase the --fix-<workaround> option.
* The extra internal matching, e.g. the duplicate match of mcpu inside:
mcpu=*:%{%:is_local_not_armv8_base(%{mcpu=*:%*}) is so we can extract the glob
using %* because the outer match would otherwise reset at the %{. The reason
for the outer glob at all is to skip the block early if no matches are found.
The workaround has the effect of suppressing certain inlining and multiply-add
formation which leads to about ~1% SPECCPU 2017 Intrate regression on modern
cores. This patch is needed because most distros configure GCC with the
workaround enabled by default.
* gcc.target/aarch64/cpunative/info_30: New test.
* gcc.target/aarch64/cpunative/info_31: New test.
* gcc.target/aarch64/cpunative/info_32: New test.
* gcc.target/aarch64/cpunative/info_33: New test.
* gcc.target/aarch64/cpunative/native_cpu_30.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_31.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_32.c: New test.
* gcc.target/aarch64/cpunative/native_cpu_33.c: New test.
* gcc.target/aarch64/erratas_opt_0.c: New test.
* gcc.target/aarch64/erratas_opt_1.c: New test.
* gcc.target/aarch64/erratas_opt_10.c: New test.
* gcc.target/aarch64/erratas_opt_11.c: New test.
* gcc.target/aarch64/erratas_opt_12.c: New test.
* gcc.target/aarch64/erratas_opt_13.c: New test.
* gcc.target/aarch64/erratas_opt_14.c: New test.
* gcc.target/aarch64/erratas_opt_15.c: New test.
* gcc.target/aarch64/erratas_opt_2.c: New test.
* gcc.target/aarch64/erratas_opt_3.c: New test.
* gcc.target/aarch64/erratas_opt_4.c: New test.
* gcc.target/aarch64/erratas_opt_5.c: New test.
* gcc.target/aarch64/erratas_opt_6.c: New test.
* gcc.target/aarch64/erratas_opt_7.c: New test.
* gcc.target/aarch64/erratas_opt_8.c: New test.
* gcc.target/aarch64/erratas_opt_9.c: New test.
aarch64: Add ISA requirements to some SVE/SME md comments
The SVE and SME md files are divided into sections, with each
section often starting with a comment that lists the associated
mnemonics. These lists usually include the base architecture
requirement in parentheses, if the base requirement is greater
than the baseline for the file. This patch tries to be more
consistent about when we do that for the recently added SVE2p1
and SME2p1 extensions.
gcc/
* config/aarch64/aarch64-sme.md: In the section comments, add the
architecture requirements alongside some mnemonics.
* config/aarch64/aarch64-sve2.md: Likewise.
This patch adds support for the following intrinsics:
- svdot[_f32_mf8]_fpm
- svdot_lane[_f32_mf8]_fpm
- svdot[_f16_mf8]_fpm
- svdot_lane[_f16_mf8]_fpm
The first two are available under a combination of the FP8DOT4 and SVE2 features.
Alternatively under the SSVE_FP8DOT4 feature under streaming mode.
The final two are available under a combination of the FP8DOT2 and SVE2 features.
Alternatively under the SSVE_FP8DOT2 feature under streaming mode.
gcc/
* config/aarch64/aarch64-option-extensions.def
(fp8dot4, ssve-fp8dot4): Add new extensions.
(fp8dot2, ssve-fp8dot2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl): Support fp8.
(svdotprod_lane_impl): Likewise.
(svdot_lane): Provide an unspec for fp8 types.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(ternary_mfloat8_def): Add new class.
(ternary_mfloat8): Add new shape.
(ternary_mfloat8_lane_group_selection_def): Add new class.
(ternary_mfloat8_lane_group_selection): Add new shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(ternary_mfloat8, ternary_mfloat8_lane_group_selection): Declare.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svdot, svdot_lane): Add new DEF_SVE_FUNCTION_GS_FPM, twice to deal
with the combination of features providing support for 32 and 16 bit
floating point.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>): Add new.
(@aarch64_sve_dot_lane<mode>): Likewise.
* config/aarch64/aarch64.h:
(TARGET_FP8DOT4, TARGET_SSVE_FP8DOT4): Add new defines.
(TARGET_FP8DOT2, TARGET_SSVE_FP8DOT2): Likewise.
* config/aarch64/iterators.md
(UNSPEC_DOT_FP8, UNSPEC_DOT_LANE_FP8): Add new unspecs.
* doc/invoke.texi: Document fp8dot4, fp8dot2, ssve-fp8dot4, ssve-fp8dot2
extensions.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Add new.
gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Likewise.
* lib/target-supports.exp: Add dg-require-effective-target support for
aarch64_asm_fp8dot2_ok, aarch64_asm_fp8dot4_ok,
aarch64_asm_ssve-fp8dot2_ok and aarch64_asm_ssve-fp8dot4_ok.
This patch adds the following intrinsics:
- svcvt1_bf16[_mf8]_fpm
- svcvt1_f16[_mf8]_fpm
- svcvt2_bf16[_mf8]_fpm
- svcvt2_f16[_mf8]_fpm
- svcvtlt1_bf16[_mf8]_fpm
- svcvtlt1_f16[_mf8]_fpm
- svcvtlt2_bf16[_mf8]_fpm
- svcvtlt2_f16[_mf8]_fpm
- svcvtn_mf8[_f16_x2]_fpm (unpredicated)
- svcvtnb_mf8[_f32_x2]_fpm
- svcvtnt_mf8[_f32_x2]_fpm
The underlying instructions are only available when SVE2 is enabled and the PE
is not in streaming SVE mode. They are also available when SME2 is enabled and
the PE is in streaming SVE mode.
gcc/
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_signature): Add an fpm_t (uint64_t) argument to functions that
set the fpm register.
(unary_convertxn_narrowt_def): New class.
(unary_convertxn_narrowt): New shape.
(unary_convertxn_narrow_def): New class.
(unary_convertxn_narrow): New shape.
* config/aarch64/aarch64-sve-builtins-shapes.h
(unary_convertxn_narrowt): Declare.
(unary_convertxn_narrow): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svcvt_fp8_impl): New class.
(svcvtn_impl): Handle fp8 cases.
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new FUNCTION.
(svcvtnb): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def
(svcvt1, svcvt2, svcvtlt1, svcvtlt2): Add new DEF_SVE_FUNCTION_GS_FPM.
(svcvtn): Likewise.
(svcvtnb, svcvtnt): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h
(svcvt1, svcvt2, svcvtlt1, svcvtlt2, svcvtnb, svcvtnt): Declare.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_cvt_mf8, TYPES_cvtn_mf8, TYPES_cvtnx_mf8): Add new types arrays.
(function_builder::get_name): Append _fpm to functions that set fpmr.
(function_resolver::check_gp_argument): Deal with the fpm_t argument.
(function_expander::expand): Set the fpm register before
calling the insn if the function warrants it.
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_fp8_cvt): Add new.
(@aarch64_sve2_fp8_cvtn): Likewise.
(@aarch64_sve2_fp8_cvtnb): Likewise.
(@aarch64_sve_cvtnt): Likewise.
* config/aarch64/aarch64.h (TARGET_SSVE_FP8): Add new.
* config/aarch64/iterators.md
(VNx8SF_ONLY, SVE_FULL_HFx2): New mode iterators.
(UNSPEC_F1CVT, UNSPEC_F1CVTLT, UNSPEC_F2CVT, UNSPEC_F2CVTLT): Add new.
(UNSPEC_FCVTNB, UNSPEC_FCVTNT): Likewise.
(UNSPEC_FP8FCVTN): Likewise.
(FP8CVT_UNS, fp8_cvt_uns_op): Likewise.
aarch64: specify fpm mode in function instances and groups
Some intrinsics require setting the fpm register before calling the
specific asm opcode required.
In order to simplify review, this patch:
- adds the fpm_mode_index attribute to function_group_info and
function_instance objects
- updates existing initialisations and call sites.
- updates equality and hash operations
gcc/
* config/aarch64/aarch64-sve-builtins-base.cc
(svdiv_impl): Specify FPM_unused when folding.
(svmul_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(build_one): Use the group fpm_mode when creating function instances.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(svaba_impl, svqrshl_impl, svqshl_impl,svrshl_impl, svsra_impl):
Specify FPM_unused when folding.
* config/aarch64/aarch64-sve-builtins.cc (function_groups): Set
fpm_mode on all elements.
(neon_sve_function_groups, sme_function_groups): Likewise.
(function_instance::hash): Include fpm_mode in hash.
(function_builder::add_overloaded_functions): Use the group fpm mode.
(function_resolver::lookup_form): Use the function instance fpm_mode
when looking up a function.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_FUNCTION_GS_FPM): add define.
(DEF_SVE_FUNCTION_GS): redefine against DEF_SVE_FUNCTION_GS_FPM.
* config/aarch64/aarch64-sve-builtins.h (fpm_mode_index): New.
(function_group_info): Add fpm_mode.
(function_instance): Likewise.
(function_instance::operator==): Handle fpm_mode.
aarch64: Add basic svmfloat8_t support to arm_sve.h
This patch adds support for the fp8 related vectors to arm_sve.h. It also
adds support for functions that just treat mfloat8_t as a bag of 8 bits
(reinterpret casts, vector manipulation, loads, stores, element selection,
vector tuples, table lookups, sve<->simd bridge); these functions are
available for fp8 whenever they're available for other 8-bit types.
Arithmetic operations, bit manipulation, conversions are notably absent.
The generated asm is mostly consistent with the _u8 equivalents and this can be
used to validate tests, except where immediates are used. These cannot be
expressed for mf8 and thus we resort to the use of function arguments found in
registers w(0-9).
gcc/
* config/aarch64/aarch64-sve-builtins.cc (TYPES_b_data): Add mf8.
(TYPES_reinterpret1, TYPES_reinterpret): Likewise.
* config/aarch64/aarch64-sve-builtins.def (svmfloat8_t): New type.
(mf8): New type suffix.
* config/aarch64/aarch64-sve-builtins.h (TYPE_mfloat): New
type_class_index.
Richard Biener [Fri, 29 Nov 2024 11:14:24 +0000 (12:14 +0100)]
tree-optimization/115438 - SLP reduction vect vs. bwaves
503.bwaves_r shows a case where the non-SLP optimization of performing
the reduction adjustment with the initial value as part of the epilogue
rather than including it as part of the initial vector value. It allows
to break a critical dependence path. The following restores this
ability for single-lane SLP.
On Zen2 this turns a 2.5% regression from GCC 14 into a 2.5%
improvement.
PR tree-optimization/115438
* tree-vect-loop.cc (vect_transform_cycle_phi): For SLP also
try to do the reduction adjustment by the initial value
in the epilogue.
c: Fix constructor bounds checking for VLA and construct VLA vector constants
This patch adds support for checking bounds of SVE ACLE vector initialization
constructors. It also adds support to construct vector constant from init
constructors.
gcc/c/ChangeLog:
* c-typeck.cc (process_init_element): Add check to restrict
constructor length to the minimum vector length allowed.
gcc/ChangeLog:
* tree.cc (build_vector_from_ctor): Add support to construct VLA vector
constants from init constructors.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: Update test to
test initialize error.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.
gimple: Handle variable-sized vectors in BIT_FIELD_REF
Handle variable-sized vectors for BIT_FIELD_REF canonicalization.
gcc/ChangeLog:
* gimple-fold.cc (maybe_canonicalize_mem_ref_addr): Handle variable
sized vector types in BIT_FIELD_REF canonicalization.
* tree-cfg.cc (verify_types_in_gimple_reference): Change object-size-
checking for BIT_FIELD_REF to error offsets that are known_gt to be
outside object-size. Out-of-range offsets can happen in the case of
indices that reference VLA SVE vector elements that may be outside the
minimum vector size range and therefore maybe_gt is not appropirate
here.
aarch64: Make C/C++ operations possible on SVE ACLE types.
This patch changes the TYPE_INDIVISBLE flag to 0 to enable SVE ACLE types to be
treated as GNU vectors and have the same semantics with operations that are
defined on GNU vectors.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types): Flip
TYPE_INDIVISBLE flag for SVE ACLE vector types.
Jakub Jelinek [Fri, 29 Nov 2024 09:19:08 +0000 (10:19 +0100)]
gimple-fold: Fix up type_has_padding_at_level_p [PR117065]
The following testcase used to ICE on the trunk since the clear small
object if it has padding optimization before my r15-5746 change,
now it doesn't just because type_has_padding_at_level_p isn't called
on the testcase.
Though, as the testcase shows, structures/unions which contain erroneous
types of one or more of its members can have TREE_TYPE of the FIELD_DECL
error_mark_node, on which we can crash.
E.g. the __builtin_clear_padding lowering just ignores those:
if (TREE_TYPE (field) == error_mark_node)
continue;
and
if (ftype == error_mark_node)
continue;
It doesn't matter much what exactly we do for those cases, as we are going
to fail the compilation anyway, but we shouldn't crash.
So, the following patch ignores those in type_has_padding_at_level_p.
For RECORD_TYPE, we already return if !DECL_SIZE (f) which I think should
cover already the erroneous fields (and we don't use TYPE_SIZE on those).
2024-11-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117065
* gimple-fold.cc (type_has_padding_at_level_p) <case UNION_TYPE>:
Also continue if f has error_mark_node type.
Jakub Jelinek [Fri, 29 Nov 2024 09:17:07 +0000 (10:17 +0100)]
__builtin_prefetch fixes [PR117608]
The r15-4833-ge9ab41b79933 patch had among tons of config/i386
specific changes also important change to the generic code, allowing
also 2 as valid value of the second argument of __builtin_prefetch:
- /* Argument 1 must be either zero or one. */
- if (INTVAL (op1) != 0 && INTVAL (op1) != 1)
+ /* Argument 1 must be 0, 1 or 2. */
+ if (INTVAL (op1) < 0 || INTVAL (op1) > 2)
But the patch failed to document that change in __builtin_prefetch
documentation, and more importantly didn't adjust any of the other
backends to deal with it (my understanding is the expected behavior
is that 2 will be silently handled as 0 unless backends have some
more specific way). Some of the backends would ICE on it, in some
cases gcc_assert failures/gcc_unreachable, in other cases crash later
(e.g. accessing arrays with that value as index and due to accessing
garbage after the array crashing at final.cc time), others treated 2
silently as 0, others treated 2 silently as 1.
And even in the i386 backend there were bugs which caused ICEs.
The patch added some if (write == 0) and write 2 handling into
a (badly indented, maybe that is the reason, if (write == 1) body),
rather than into the else side, so it would be always false.
The new *prefetch_rst2 define_insn only accepts parameters 2 1
(i.e. read-shared with moderate degree of locality), so in order
not to ICE the patch uses it only for __builtin_prefetch (ptr, 2, 1);
or __builtin_ia32_prefetch (ptr, 2, 1, 0); and not for other values
of the parameter. If that isn't what we want and we want it to be used
also for all or some of __builtin_prefetch (ptr, 2, {0,2,3}); and
corresponding __builtin_ia32_prefetch, maybe the define_insn could match
other values.
And there was another problem that -mno-mmx -mno-sse -mmovrs compilation
would ICE on most of the prefetches, so I had to add the FAIL; cases.
2024-11-29 Jakub Jelinek <jakub@redhat.com>
PR target/117608
* doc/extend.texi (__builtin_prefetch): Document that second
argument may be also 2 and its meaning.
* config/i386/i386.md (prefetch): Remove unreachable code.
Clear write set operands[1] to const0_rtx if !TARGET_MOVRS or
of locality is not 1. Formatting fixes.
* config/i386/i386-expand.cc (ix86_expand_builtin): Use IN_RANGE.
Call gen_prefetch even for TARGET_MOVRS.
* config/alpha/alpha.md (prefetch): Treat read_or_write 2 like 0.
* config/mips/mips.md (prefetch): Likewise.
* config/arc/arc.md (prefetch_1, prefetch_2, prefetch_3): Likewise.
* config/riscv/riscv.md (prefetch): Likewise.
* config/loongarch/loongarch.md (prefetch): Likewise.
* config/sparc/sparc.md (prefetch): Likewise. Use IN_RANGE.
* config/ia64/ia64.md (prefetch): Likewise.
* config/pa/pa.md (prefetch): Likewise.
* config/aarch64/aarch64.md (prefetch): Likewise.
* config/rs6000/rs6000.md (prefetch): Likewise.
* gcc.dg/builtin-prefetch-1.c (good): Add tests with second argument
2.
* gcc.target/i386/pr117608-1.c: New test.
* gcc.target/i386/pr117608-2.c: New test.
Andrew Pinski [Fri, 29 Nov 2024 09:00:11 +0000 (01:00 -0800)]
fortran: Add default to switch in gfc_trans_transfer [PR117843]
This fixes a bootstrap failure due to a warning on enum values not being
handled. In this case, it is just checking two values and the rest should
are not handled so adding a default case fixes the issue.
When ifcombining contiguous blocks, we can follow forwarder blocks and
reverse conditions to enable combinations, but when there are
intervening blocks, we have to constrain ourselves to paths to the
exit that share the PHI args with all intervening blocks.
Avoiding considering forwarders when intervening blocks were present
would match the preexisting test, but we can do better, recording in
case a forwarded path corresponds to the outer block's exit path, and
insisting on not combining through any other path but the one that was
verified as corresponding. The latter is what this patch implements.
While at that, I've fixed some typos, introduced early testing before
computing the exit path to avoid it when computing it would be
wasteful, or when avoiding it can enable other sound combinations.
for gcc/ChangeLog
PR tree-optimization/117723
* tree-ssa-ifcombine.cc (tree_ssa_ifcombine_bb): Record
forwarder blocks in path to exit, and stick to them. Avoid
computing the exit if obviously not needed, and if that
enables additional optimizations.
(tree_ssa_ifcombine_bb_1): Fix typos.
Harald Anlauf [Wed, 27 Nov 2024 20:11:16 +0000 (21:11 +0100)]
Fortran: fix crash with bounds check writing array section [PR117791]
PR fortran/117791
gcc/fortran/ChangeLog:
* trans-io.cc (gfc_trans_transfer): When an array index depends on
a function evaluation or an expression, do not use optimized array
I/O of an array section and fall back to normal scalarization.
gcc/testsuite/ChangeLog:
* gfortran.dg/bounds_check_array_io.f90: New test.
Uros Bizjak [Thu, 28 Nov 2024 16:44:03 +0000 (17:44 +0100)]
i386: Macroize compound shift patterns some more
Merge ashl and <any_shiftrt:code> compound define_insn_and_split
patterns to form <any_shift:code> macroized pattern.
No functional changes.
gcc/ChangeLog:
* config/i386/i386.md (*<any_shift:insn><mode>3_mask): Macroize
pattern from *ashl<mode>3_mask and *<any_shiftrt:insn><mode>3_mask
using any_shift code iterator.
(*<any_shift:insn><mode>3_mask_1): Macroize pattern
from *ashl<mode>3_mask_1 and *<any_shiftrt:insn><mode>3_mask_1
using any_shift code iterator.
(*<any_shift:insn><mode>3_add): Macroize pattern
from *ashl<mode>3_add and *<any_shiftrt:insn><mode>3_add
using any_shift code iterator.
(*<any_shift:insn><mode>3_add_1): Macroize pattern
from *ashl<mode>3_add_1 and *<any_shiftrt:insn><mode>3_add_1
using any_shift code iterator.
(*<insn><mode>3_sub): Macroize pattern
from *ashl<mode>3_sub and *<any_shiftrt:insn><mode>3_sub
using any_shift code iterator.
(*<any_shift:insn><mode>3_sub_1): Macroize pattern
from *ashl<mode>3_sub_1 and *<any_shiftrt:insn><mode>3_sub_1
using any_shift code iterator.
Jonathan Wakely [Mon, 25 Nov 2024 23:01:28 +0000 (23:01 +0000)]
libstdc++: Reduce duplication in Doxygen comments for std::list
We have a number of comments which are duplicated for C++98 and C++11
overloads, where the signatures are slightly different. Instead of
duplicating the comments that are 90% identical, just use a single
comment that can apply to both. In some cases this means saying "an
iterator" instead of "A const iterator" but that's fine, a
std::list::const_iterator is still an iterator (and a non-const iterator
is a valid argument to those functions because they'll implicitly
convert to const_iterator).
In two cases the @return description just needs to say that it returns
void for C++98 and an iterator otherwise.
libstdc++-v3/ChangeLog:
* include/bits/stl_list.h: Reduce duplication in doxygen
comments.
Marek Polacek [Wed, 9 Oct 2024 15:54:37 +0000 (11:54 -0400)]
c++: Implement P2662R3, Pack Indexing [PR113798]
This patch implements C++26 Pack Indexing, as described in
<https://wg21.link/P2662R3>.
The issue discussing how to mangle pack indexes has not been resolved
yet <https://github.com/itanium-cxx-abi/cxx-abi/issues/175> and I've
made no attempt to address it so far.
Unlike v1, which used augmented TYPE/EXPR_PACK_EXPANSION codes, this
version introduces two new codes: PACK_INDEX_EXPR and PACK_INDEX_TYPE.
Both carry two operands: the pack expansion and the index. They are
handled in tsubst_pack_index: substitute the index and the pack and
then extract the element from the vector (if possible).
To handle pack indexing in a decltype or with decltype(auto), there is
also the new PACK_INDEX_PARENTHESIZED_P flag.
With this feature, it's valid to write something like
using U = tmpl<Ts...[Is]...>;
where we first expand the template argument into
Ts...[Is#0], Ts...[Is#1], ...
and then substitute each individual pack index.
PR c++/113798
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1) <case PACK_INDEX_EXPR>:
New case.
* cp-objcp-common.cc (cp_common_init_ts): Mark PACK_INDEX_TYPE and
PACK_INDEX_EXPR.
* cp-tree.def (PACK_INDEX_TYPE): New.
(PACK_INDEX_EXPR): New.
* cp-tree.h (WILDCARD_TYPE_P): Also check PACK_INDEX_TYPE.
(PACK_INDEX_CHECK): Define.
(PACK_INDEX_P): Define.
(PACK_INDEX_PACK): Define.
(PACK_INDEX_INDEX): Define.
(PACK_INDEX_PARENTHESIZED_P): Define.
(make_pack_index): Declare.
(pack_index_element): Declare.
* cxx-pretty-print.cc (cxx_pretty_printer::expression) <case
PACK_INDEX_EXPR>: New case.
(cxx_pretty_printer::type_id) <case PACK_INDEX_TYPE>: New case.
* error.cc (dump_type) <case PACK_INDEX_TYPE>: New case.
(dump_type_prefix): Handle PACK_INDEX_TYPE.
(dump_type_suffix): Likewise.
(dump_expr) <case PACK_INDEX_EXPR>: New case.
* mangle.cc (write_type) <case PACK_INDEX_TYPE>: New case.
* module.cc (trees_out::type_node) <case PACK_INDEX_TYPE>: New case.
(trees_in::tree_node) <case PACK_INDEX_TYPE>: New case.
* parser.cc (cp_parser_next_tokens_are_pack_index_p): New.
(cp_parser_pack_index): New.
(cp_parser_primary_expression): Handle a C++26 pack-index-expression.
(cp_parser_unqualified_id): Handle a C++26 pack-index-specifier.
(cp_parser_nested_name_specifier_opt): See if a pack-index-specifier
follows.
(cp_parser_qualifying_entity): Handle a C++26 pack-index-specifier.
(cp_parser_decltype_expr): Set id_expression_or_member_access_p for
pack indexing.
(cp_parser_mem_initializer_id): Handle a C++26 pack-index-specifier.
(cp_parser_simple_type_specifier): Likewise.
(cp_parser_base_specifier): Likewise.
* pt.cc (iterative_hash_template_arg) <case PACK_INDEX_TYPE,
PACK_INDEX_EXPR>: New case.
(find_parameter_packs_r) <case PACK_INDEX_TYPE, PACK_INDEX_EXPR>: New
case.
(make_pack_index): New.
(tsubst_pack_index): New.
(tsubst): Avoid tsubst on PACK_INDEX_TYPE.
<case TYPENAME_TYPE>: Add a call to error.
<case PACK_INDEX_TYPE>: New case.
(tsubst_expr) <case PACK_INDEX_EXPR>: New case.
(dependent_type_p_r): Return true for PACK_INDEX_TYPE.
(type_dependent_expression_p): Recurse on PACK_INDEX_PACK for
PACK_INDEX_EXPR.
* ptree.cc (cxx_print_type) <case PACK_INDEX_TYPE>: New case.
* semantics.cc (finish_parenthesized_expr): Set
PACK_INDEX_PARENTHESIZED_P for PACK_INDEX_EXPR.
(finish_type_pack_element): Adjust error messages.
(pack_index_element): New.
* tree.cc (cp_tree_equal) <case PACK_INDEX_EXPR>: New case.
(cp_walk_subtrees) <case PACK_INDEX_TYPE, PACK_INDEX_EXPR>: New case.
* typeck.cc (structural_comptypes) <case PACK_INDEX_TYPE>: New case.
* g++.dg/cpp26/pack-indexing1.C: New test.
* g++.dg/cpp26/pack-indexing2.C: New test.
* g++.dg/cpp26/pack-indexing3.C: New test.
* g++.dg/cpp26/pack-indexing4.C: New test.
* g++.dg/cpp26/pack-indexing5.C: New test.
* g++.dg/cpp26/pack-indexing6.C: New test.
* g++.dg/cpp26/pack-indexing7.C: New test.
* g++.dg/cpp26/pack-indexing8.C: New test.
* g++.dg/cpp26/pack-indexing9.C: New test.
* g++.dg/cpp26/pack-indexing10.C: New test.
* g++.dg/cpp26/pack-indexing11.C: New test.
* g++.dg/modules/pack-index-1_a.C: New test.
* g++.dg/modules/pack-index-1_b.C: New test.
Mariam Arutunian [Mon, 11 Nov 2024 19:51:18 +0000 (12:51 -0700)]
[PATCH v6 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs.
This patch introduces new built-in functions to GCC for computing
bit-forward and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code
for a table-based CRC calculation.
The built-ins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16,
__builtin_rev_crc32_data32
__builtin_rev_crc64_data8, __builtin_rev_crc64_data16,
__builtin_rev_crc64_data32, __builtin_rev_crc64_data64,
__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16, __builtin_crc64_data32,
__builtin_crc64_data64
Each built-in takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.
To validate the correctness of these built-ins, this patch also includes
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC
computation options
that automatically adapt to the capabilities of the target hardware.
* gcc.dg/crc-builtin-rev-target32.c: New test.
* gcc.dg/crc-builtin-rev-target64.c: New test.
* gcc.dg/crc-builtin-target32.c: New test.
* gcc.dg/crc-builtin-target64.c: New test.
Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
Mariam Arutunian [Mon, 11 Nov 2024 19:48:34 +0000 (12:48 -0700)]
[PATCH v6 01/12] Implement internal functions for efficient CRC computation.
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.
gcc/
* doc/md.texi (crc@var{m}@var{n}4, crc_rev@var{m}@var{n}4): Document.
* expr.cc (calculate_crc): New function.
(assemble_crc_table): Likewise.
(generate_crc_table): Likewise.
(calculate_table_based_CRC): Likewise.
(expand_crc_table_based): Likewise.
(gen_common_operation_to_reflect): Likewise.
(reflect_64_bit_value): Likewise.
(reflect_32_bit_value): Likewise.
(reflect_16_bit_value): Likewise.
(reflect_8_bit_value): Likewise.
(generate_reflecting_code_standard): Likewise.
(expand_reversed_crc_table_based): Likewise.
* expr.h (generate_reflecting_code_standard): New function declaration.
(expand_crc_table_based): Likewise.
(expand_reversed_crc_table_based): Likewise.
* internal-fn.cc: (crc_direct): Define.
(direct_crc_optab_supported_p): Likewise.
(expand_crc_optab_fn): New function
* internal-fn.def (CRC, CRC_REV): New internal functions.
* optabs.def (crc_optab, crc_rev_optab): New optabs.
Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com> Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
[-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c -DACC_DEVICE_TYPE_radeon=1 -DACC_MEM_SHARED=0 -foffload=amdgcn-amdhsa -O0 [-execution test-]{+compilation failed to produce executable+}
[Etc.]
..., due to:
[...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:13: error: two or more data types in declaration specifiers
[...]/libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_get_property-gcn.c:16:1: warning: useless type name in empty declaration
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c
[!__cplusplus]: Don't 'typedef int bool;'.
Jakub Jelinek [Thu, 28 Nov 2024 13:54:42 +0000 (14:54 +0100)]
testsuite: Fix up pr116675.c test [PR116675]
The test uses dg-do run and scan-assembler* at the same time,
that obviously doesn't work when pr116675.s isn't created at all,
so one gets
PASS: gcc.target/i386/pr116675.c execution test
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4
gcc.target/i386/pr116675.c: output file does not exist
UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times por 4
The usual way to handle that is adding -save-temps option.
The test FAILs after that change though, for simple reason, the pand
regex doesn't match just pand instructions, but also the pandn ones.
I've added \t there to make sure it matches only pand.
Though, wonder if it wouldn't be safer to split the test into two,
one with just the 4 functions (why noinline, noclone rather than
noipa, btw?), that one would be dg-do compile and have the scan-assembler*
directives, and then another one which includes the first one and is
dg-do run and contains the runtime checking of those.
In any case, I've committed this as obvious.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR target/116675
* gcc.target/i386/pr116675.c: Add -save-temps to dg-options.
Scan for pand\t rather than pand.
Jakub Jelinek [Thu, 28 Nov 2024 13:31:44 +0000 (14:31 +0100)]
docs: Fix up __sync_* documentation [PR117642]
The PR14311 commit which added support for __sync_* builtins documented that
there is a warning if a particular operation cannot be implemented.
But that commit nor anything later on implemented such warning, it was
always silent generation of the mentioned calls (which can in most cases
result in linker errors of course because those functions aren't implemented
anywhere, in libatomic or elsewhere in code shipped in gcc).
So, the following patch just adjust the documentation to match the
implementation.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR target/117642
* doc/extend.texi: Remove documentation of warning for unimplemented
__sync_* operations, such warning has never been implemented.
On top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668554.html
patch which introduces the nonnull_if_nonzero attribute (because
C2Y is allowing NULL arguments on various calls like memcpy, memset,
strncpy etc. as long as the count is 0) the following patch adds just
limited handling of the attribute in the ranger, in particular infers
nonnull for the pointer argument referenced in first argument of the
attribute if the second argument is a non-zero INTEGER_CST
(integer_nonzerop).
Ideally (as the FIXME says) I'd like to query arg2 range and check if
it doesn't contain zero, but am not sure such queries are possible from
gimple_infer_range (and if it is possible whether one can just query
the currently recorded range for it or if one can call something that
will try to compute the range by walking the def stmts etc.).
Could you handle as a follow-up the range querying if it is possible?
As for useful testcase, with the patch I'm going to post next
e.g. gcc.dg/tree-ssa/pr78154.c if the calls use d as destination (not dn)
and count that will have a range which doesn't include 0 and isn't constant.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR c/117023
* gimple-range-infer.cc (gimple_infer_range::gimple_infer_range):
Handle also nonnull_if_nonzero attributes.
Jakub Jelinek [Thu, 28 Nov 2024 10:48:33 +0000 (11:48 +0100)]
Add support for nonnull_if_nonzero attribute [PR117023]
As mentioned in an earlier thread, C2Y voted in a change which made
various library APIs callable with NULL arguments in certain cases,
e.g.
memcpy (NULL, NULL, 0);
is now valid, although
memcpy (NULL, NULL, 1);
remains invalid. This affects various APIs, including several of
GCC builtins; plus on the C library side those APIs are often declared
with nonnull attribute(s) as well.
Florian suggested using the access attribute for this, but our docs
explicitly say that access attribute doesn't imply nonnull and it doesn't
cover e.g. the qsort case where the comparison function pointer may be
also NULL if nmemb is 0, but must be non-zero otherwise.
As this case affects 21 APIs in C standard and I think is going to affect
various wrappers around those in various packages as well, I think it
is a common thing that should have its own attribute, because we should
still warn when people use
qsort (NULL, 1, 1, NULL);
etc., and similarly want to have -fsanitize=null instrumentation for those.
So, the following patch introduces nonnull_if_nonzero attribute (or would
you prefer cond_nonnull or some other name?), which has always 2 arguments,
argument index of a pointer argument (like one argument nonnull) and
argument index of an associated integral argument. If that argument is
non-zero, it is UB to pass NULL to the pointer argument, if that argument
is zero, it is valid. And changes various spots which already handled the
nonnull attribute to handle this one as well, with sometimes using the
ranger (or for -fsanitize=nonnull explicitly checking the associated
argument value, so instead of if (!ptr) __ubsan_... (...); it will
now do if (!ptr && sz) __ubsan_... (...);).
I've so far omitted changing gimple_infer_range (am not 100% sure how I can
use the ranger inside of the ranger) and changing the analyzer to handle it.
And I haven't changed builtins.def etc. to make use of that attribute
instead of nonnull where appropriate.
I'd then follow with the builtins.def changes (and eventually glibc
etc. would need to be adjusted too).
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR c/117023
gcc/
* gimple.h (infer_nonnull_range_by_attribute): Add a tree *
argument defaulted to NULL.
* gimple.cc (infer_nonnull_range_by_attribute): Add op2 argument.
Handle also nonnull_if_nonzero attributes.
* tree.cc (get_nonnull_args): Fix comment typo.
* builtins.cc (validate_arglist): Handle nonnull_if_nonzero attribute.
* tree-ssa-ccp.cc (pass_post_ipa_warn::execute): Handle
nonnull_if_nonzero attributes.
* ubsan.cc (instrument_nonnull_arg): Adjust
infer_nonnull_range_by_attribute caller. If it returned true and
filed in non-NULL arg2, check that arg2 is non-zero as another
condition next to checking that arg is zero.
* doc/extend.texi (nonnull_if_nonzero): Document new attribute.
gcc/c-family/
* c-attribs.cc (handle_nonnull_if_nonzero_attribute): New
function.
(c_common_gnu_attributes): Add nonnull_if_nonzero attribute.
(handle_nonnull_attribute): Fix comment typo.
* c-common.cc (struct nonnull_arg_ctx): Add other member.
(check_function_nonnull): Also check nonnull_if_nonzero attributes.
(check_nonnull_arg): Use different warning wording if pctx->other
is non-zero.
(check_function_arguments): Initialize ctx.other.
gcc/testsuite/
* gcc.dg/nonnull-8.c: New test.
* gcc.dg/nonnull-9.c: New test.
* gcc.dg/nonnull-10.c: New test.
* c-c++-common/ubsan/nonnull-6.c: New test.
* c-c++-common/ubsan/nonnull-7.c: New test.
Jakub Jelinek [Thu, 28 Nov 2024 10:45:00 +0000 (11:45 +0100)]
rs6000: Add PowerPC inline asm redzone clobber support
The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667949.html
patch adds rs6000 part of the support (the only other target I'm aware of
which clearly has red zone as well).
2024-11-28 Jakub Jelinek <jakub@redhat.com>
* config/rs6000/rs6000.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Force
info->push_p if cfun->machine->asm_redzone_clobber_seen.
* config/rs6000/rs6000.cc (TARGET_REDZONE_CLOBBER): Redefine.
(rs6000_redzone_clobber): New function.
Jakub Jelinek [Thu, 28 Nov 2024 10:42:11 +0000 (11:42 +0100)]
inline-asm, i386: Add "redzone" clobber support
The following patch adds a "redzone" clobber (recognized everywhere,
even on on targets which don't do anything with it),
with which one can mark the rare case where inline asm pushes
something on the stack or uses call instruction without taking
red zone into account (i.e. addq $-128, %rsp; and addq $128, %rsp
around that).
2024-11-28 Jakub Jelinek <jakub@redhat.com>
gcc/
* target.def (redzone_clobber): New target hook.
* varasm.cc (decode_reg_name_and_count): Return -5 for
"redzone".
* cfgexpand.cc (expand_asm_stmt): Handle redzone clobber.
* config/i386/i386.h (struct machine_function): Add
asm_redzone_clobber_seen member.
* config/i386/i386.cc (ix86_compute_frame_layout): Don't
use red zone if cfun->machine->asm_redzone_clobber_seen.
(ix86_redzone_clobber): New function.
(TARGET_REDZONE_CLOBBER): Redefine.
* doc/extend.texi (Clobbers and Scratch Registers): Document
the "redzone" clobber.
* doc/tm.texi.in: Add @hook TARGET_REDZONE_CLOBBER.
* doc/tm.texi: Regenerate.
gcc/testsuite/
* gcc.dg/asm-redzone-1.c: New test.
* gcc.target/i386/asm-redzone-1.c: New test.
Jakub Jelinek [Thu, 28 Nov 2024 10:30:32 +0000 (11:30 +0100)]
c++: Small initial fixes for zeroing of padding bits [PR117256]
https://eel.is/c++draft/dcl.init#general-6
says that even padding bits are supposed to be zeroed during
zero-initialization.
The following patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665565.html
patch attempts to implement that, though only for the easy
cases so far, in particular marks the CONSTRUCTOR created during
zero-initialization (or zero-initialization done during the
value-initialization) as having padding bits cleared and for
constexpr evaluation attempts to preserve that bit on a new CONSTRUCTOR
created for CONSTRUCTOR_ZERO_PADDING_BITS lhs.
I think we need far more than that, but am not sure where exactly
to implement that.
In particular, I think __builtin_bitcast should take it into account
during constant evaluation, if the padding bits in something are guaranteed
to be zero, then I'd think std::bitcast out of it and testing those
bits in there should be well defined.
But if we do that, the flag needs to be maintained precisely, not just
conservatively, so e.g. any place where some object is copied into another
one (except bitcast?) which would be element-wise copy, the bit should
be cleared (or preserved from the earlier state? I'd hope
element-wise copying invalidates even the padding bits, but then what
about just stores into some members, do those invalidate the padding bits
in the rest of the object?). But if it is an elided copy, it shouldn't.
And am not really sure what happens e.g. with non-automatic constexpr
variables. If it is constructed by something that doesn't guarantee
the zeroing of the padding bits (so similarly constructed constexpr automatic
variable would have undefined state of the padding bits), are those padding
bits well defined because it isn't automatic variable?
Anyway, I hope the following patch is at least a small step in the right
direction.
Jakub Jelinek [Thu, 28 Nov 2024 10:18:07 +0000 (11:18 +0100)]
expr, c: Don't clear whole unions [PR116416]
As discussed earlier, we currently clear padding bits even when we
don't have to and that causes pessimization of emitted code,
e.g. for
union U { int a; long b[64]; };
void bar (union U *);
void
foo (void)
{
union U u = { 0 };
bar (&u);
}
we need to clear just u.a, not the whole union, but on the other side
in cases where the standard requires padding bits to be zeroed, like for
C23 {} initializers of aggregates with padding bits, or for C++11 zero
initialization we don't do that.
This patch
a) moves some of the stuff into complete_ctor_at_level_p (but not
all the *p_complete = 0; case, for that it would need to change
so that it passes around the ctor rather than just its type) and
changes the handling of unions
b) introduces a new option, so that users can either get the new
behavior (only what is guaranteed by the standards, the default),
or previous behavior (union padding zero initialization, no such
guarantees in structures) or also a guarantee in structures
c) introduces a new CONSTRUCTOR flag which says that the padding bits
(if any) should be zero initialized (and sets it for now in the C
FE for C23 {} initializers).
Am not sure the CONSTRUCTOR_ZERO_PADDING_BITS flag is really needed
for C23, if there is just empty initializer, I think we already mark
it as incomplete if there are any missing initializers. Maybe with
some designated initializer games, say
void foo () {
struct S { char a; long long b; };
struct T { struct S c; } t = { .c = {}, .c.a = 1, .c.b = 2 };
...
}
Is this supposed to initialize padding bits in C23 and then the .c.a = 1
and .c.b = 2 stores preserve those padding bits, so is that supposed
to be different from struct T t2 = { .c = { 1, 2 } };
? What about just struct T t3 = { .c.a = 1, .c.b = 2 }; ?
And I haven't touched the C++ FE for the flag, because I'm afraid I'm lost
on where exactly is zero-initialization done (vs. other types of
initialization) and where is e.g. zero-initialization of a temporary then
(member-wise) copied.
Say
struct S { char a; long long b; };
struct T { constexpr T (int a, int b) : c () { c.a = a; c.b = b; } S c; };
void bar (T *);
void
foo ()
{
T t (1, 2);
bar (&t);
}
Is the c () value-initialization of t.c followed by c.a and c.b updates
which preserve the zero initialized padding bits? Or is there some
copy construction involved which does member-wise copying and makes the
padding bits undefined?
Looking at (older) clang++ with -O2, it initializes also the padding bits
when c () is used and doesn't with c {}.
For GCC, note that there is that optimization from Alex to zero padding bits
for optimization purposes for small aggregates, so either one needs to look
at -O0 -fdump-tree-gimple dumps, or use larger structures which aren't
optimized that way.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR c++/116416
gcc/
* flag-types.h (enum zero_init_padding_bits_kind): New type.
* tree.h (CONSTRUCTOR_ZERO_PADDING_BITS): Define.
* common.opt (fzero-init-padding-bits=): New option.
* expr.cc (categorize_ctor_elements_1): Handle
CONSTRUCTOR_ZERO_PADDING_BITS or
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_ALL. Fix
up *p_complete = -1; setting for unions.
(complete_ctor_at_level_p): Handle unions differently for
flag_zero_init_padding_bits == ZERO_INIT_PADDING_BITS_STANDARD.
* gimple-fold.cc (type_has_padding_at_level_p): Fix up UNION_TYPE
handling, return also true for UNION_TYPE with no FIELD_DECLs
and non-zero size, handle QUAL_UNION_TYPE like UNION_TYPE.
* doc/invoke.texi (-fzero-init-padding-bits=@var{value}): Document.
gcc/c/
* c-parser.cc (c_parser_braced_init): Set CONSTRUCTOR_ZERO_PADDING_BITS
for flag_isoc23 empty initializers.
* c-typeck.cc (constructor_zero_padding_bits): New variable.
(struct constructor_stack): Add zero_padding_bits member.
(really_start_incremental_init): Save and clear
constructor_zero_padding_bits.
(push_init_level): Save constructor_zero_padding_bits. Or into it
CONSTRUCTOR_ZERO_PADDING_BITS from previous value if implicit.
(pop_init_level): Set CONSTRUCTOR_ZERO_PADDING_BITS if
constructor_zero_padding_bits and restore
constructor_zero_padding_bits.
gcc/testsuite/
* gcc.dg/plugin/infoleak-1.c (test_union_2b, test_union_4b): Expect
diagnostics.
* gcc.dg/c23-empty-init-5.c: New test.
* gcc.dg/gnu11-empty-init-1.c: New test.
* gcc.dg/gnu11-empty-init-2.c: New test.
* gcc.dg/gnu11-empty-init-3.c: New test.
* gcc.dg/gnu11-empty-init-4.c: New test.
Where two things go wrong, the wrong mask is used and the address pointers to
the stores are wrong.
This issue is happening because the codegen loop in vectorizable_store is a
nested loop where in the outer loop we iterate over ncopies and in the inner
loop we loop over vec_num.
For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
determined by only the outerloop index and the pointer address is only updated
in the outer loop.
As such for SLP we always use the same predicate and the same memory location.
This patch flattens the two loops and instead iterates over ncopies * vec_num
and simplified the indexing.
This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error
moves somewhere else. I will look at that next but fixes some other libraries
that also started failing.
gcc/ChangeLog:
PR tree-optimization/117557
* tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
vec_num loops.
gcc/testsuite/ChangeLog:
PR tree-optimization/117557
* gcc.target/aarch64/pr117557.c: New test.
Jakub Jelinek [Thu, 28 Nov 2024 09:51:16 +0000 (10:51 +0100)]
gimple-fold: Avoid ICEs with bogus declarations like const attribute no snprintf [PR117358]
When one puts incorrect const or pure attributes on declarations of various
C APIs which have corresponding builtins (vs. what they actually do), we can
get tons of ICEs in gimple-fold.cc.
The following patch fixes it by giving up gimple_fold_builtin_* folding
if the functions don't have gimple_vdef (or for pure functions like
bcmp/strchr/strstr gimple_vuse) when in SSA form (during gimplification
they will surely have both of those NULL even when declared correctly,
yet it is highly desirable to fold them).
Or shall I replace
!gimple_vdef (stmt) && gimple_in_ssa_p (cfun)
tests with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_PURE | ECF_NOVOPS)) != 0
and
!gimple_vuse (stmt) && gimple_in_ssa_p (cfun)
with
(gimple_call_flags (stmt) & (ECF_CONST | ECF_NOVOPS)) != 0
?
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117358
* gimple-fold.cc (gimple_fold_builtin_memory_op): Punt if stmt has no
vdef in ssa form.
(gimple_fold_builtin_bcmp): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_bcopy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_bzero): Likewise.
(gimple_fold_builtin_memset): Likewise. Use return false instead of
return NULL_TREE.
(gimple_fold_builtin_strcpy): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strncpy): Likewise.
(gimple_fold_builtin_strchr): Punt if stmt has no vuse in ssa form.
(gimple_fold_builtin_strstr): Likewise.
(gimple_fold_builtin_strcat): Punt if stmt has no vdef in ssa form.
(gimple_fold_builtin_strcat_chk): Likewise.
(gimple_fold_builtin_strncat): Likewise.
(gimple_fold_builtin_strncat_chk): Likewise.
(gimple_fold_builtin_string_compare): Likewise.
(gimple_fold_builtin_fputs): Likewise.
(gimple_fold_builtin_memory_chk): Likewise.
(gimple_fold_builtin_stxcpy_chk): Likewise.
(gimple_fold_builtin_stxncpy_chk): Likewise.
(gimple_fold_builtin_stpcpy): Likewise.
(gimple_fold_builtin_snprintf_chk): Likewise.
(gimple_fold_builtin_sprintf_chk): Likewise.
(gimple_fold_builtin_sprintf): Likewise.
(gimple_fold_builtin_snprintf): Likewise.
(gimple_fold_builtin_fprintf): Likewise.
(gimple_fold_builtin_printf): Likewise.
(gimple_fold_builtin_realloc): Likewise.
Jakub Jelinek [Thu, 28 Nov 2024 09:23:47 +0000 (10:23 +0100)]
builtins: Handle BITINT_TYPE in __builtin_iseqsig folding [PR117802]
In check_builtin_function_arguments in the _BitInt patchset I've changed
INTEGER_TYPE tests to INTEGER_TYPE or BITINT_TYPE, but haven't done the
same in fold_builtin_iseqsig, which now ICEs because of that.
The following patch fixes that.
BTW, that TYPE_PRECISION (type0) >= TYPE_PRECISION (type1) test
for REAL_TYPE vs. REAL_TYPE looks pretty random and dangerous, I think
it would be useful to handle this builtin also in the C and C++ FEs,
if both arguments have REAL_TYPE, use the FE specific routine to decide
which types to use and error if a comparison between types would be
erroneous (e.g. complain about _Decimal* vs. float/double/long
double/_Float*, pick up the preferred type, complain about
__ibm128 vs. _Float128 in C++, etc.).
But the FEs can just promote one argument to the other in that case
and keep fold_builtin_iseqsig as is for say Fortran and other FEs.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR c/117802
* builtins.cc (fold_builtin_iseqsig): Handle BITINT_TYPE like
INTEGER_TYPE.
* gcc.dg/builtin-iseqsig-1.c: New test.
* gcc.dg/bitint-118.c: New test.
Joseph Myers [Thu, 28 Nov 2024 02:41:35 +0000 (02:41 +0000)]
c: Fix gimplification ICE for shifts with invalid redeclarations
As reported in bug 117757, there is a C gimplification ICE for shifts
involving a variable that was incompatibly redeclared (and thus had
its type changed to error_mark_node). Fix this with an appropriate
error_operand_p check.
Note that this is not the same issue as any of the other bugs reported
for ICEs later in the gimplifier dealing with such erroneous
redeclarations (it is, however, the same as the *second* ICE reported
in bug 115644 - the test in comment#1 for that bug, not the one in the
original bug report).
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/117757
gcc/c-family/
* c-gimplify.cc (c_gimplify_expr): Check for error_operand_p
before calling TYPE_MAIN_VARIANT for shifts.
David Malcolm [Thu, 28 Nov 2024 00:21:15 +0000 (19:21 -0500)]
c-family: offer suggestions for missing command-line options [PR82892]
Some builtin macros are only defined when certain command-line options
are provided. Update the error messages for them so that we suggest
the pertinent option ('-fopenacc' for '_OPENACC', and '-fopenmp' for
'_OPENMP')
David Malcolm [Thu, 28 Nov 2024 00:21:15 +0000 (19:21 -0500)]
C/C++: add fix-it hints for missing '&' and '*' (v5) [PR87850]
This patch adds a note with a fix-it hint to various
pointer-vs-non-pointer diagnostics, suggesting the addition of
a leading '&' or '*'.
For example, note the ampersand fix-it hint in the following:
demo.c: In function 'int main()':
demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka 'unsigned int'}
to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
5 | pthread_key_create(key, NULL);
| ^~~
| |
| pthread_key_t {aka unsigned int}
demo.c:5:22: note: possible fix: take the address with '&'
5 | pthread_key_create(key, NULL);
| ^~~
| &
In file included from demo.c:1:
/usr/include/pthread.h:1122:47: note: initializing argument 1 of
'int pthread_key_create(pthread_key_t*, void (*)(void*))'
1122 | extern int pthread_key_create (pthread_key_t *__key,
| ~~~~~~~~~~~~~~~^~~~~
gcc/c-family/ChangeLog:
PR c++/87850
* c-common.cc: Include "gcc-rich-location.h".
(maybe_emit_indirection_note): New function.
* c-common.h (maybe_emit_indirection_note): New decl.
(compatible_types_for_indirection_note_p): New decl.
gcc/c/ChangeLog:
PR c++/87850
* c-typeck.cc (compatible_types_for_indirection_note_p): New
function.
(convert_for_assignment): Call maybe_emit_indirection_note for
pointer vs non-pointer diagnostics.
gcc/cp/ChangeLog:
PR c++/87850
* call.cc (convert_like_real): Call maybe_emit_indirection_note
for "invalid conversion" diagnostic.
* typeck.cc (compatible_types_for_indirection_note_p): New
function.
gcc/testsuite/ChangeLog:
PR c++/87850
* c-c++-common/indirection-fixits.c: New test.
* g++.dg/template/error60.C: Add fix-it hint to expected output.
* g++.dg/template/error60a.C: Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Jan Hubicka [Wed, 27 Nov 2024 22:52:37 +0000 (23:52 +0100)]
optimize basic_string
Add __builtin_unreachable conditionls to declare value ranges of
basic_string::length(). FIx max_size() to return actual max size
using logic similar to std::vector. Aviod use of size() in empty()
to save some compile time overhead.
As disucced, max_size() change is technically ABI breaking, but
hopefully this does not really matter in practice.
Change of length() breaks empty-loop testcase where we now optimize the
loop only after inlining, so template is updated to check cddce3 instead
of cddce2. This is PR117764.
With these chages we now optimize out unused strings as tested in
string-1.C
libstdc++-v3/ChangeLog:
* include/bits/basic_string.h (basic_string::size(),
basic_string::length(), basic_string::capacity()): Add
__builtin_unreachable to declare value ranges.
(basic_string::empty()): Implement directly
(basic_string::max_size()): Account correctly the terminating 0
and limits implied by ptrdiff_t.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/empty-loop.C: xfail optimization at cddce2 and check
it happens at cddce3.
* g++.dg/tree-ssa/string-1.C: New test.
Joseph Myers [Wed, 27 Nov 2024 22:27:08 +0000 (22:27 +0000)]
c: Fix ICE using function name in parameter type in old-style function definition [PR91193]
As reported in bug 91193, if an old-style function definition
redeclares a typedef name as a function, then uses that function name
at the start of the first old-style parameter definition, then the
parser interprets that token as a typedef name (because lookahead
occurred before processing of the function declarator completed), but
when it is looked up in processing that parameter definition, what is
found is the redefinition, resulting in an ICE.
The function name's scope starts at the end of its declarator, so this
is similar to other cases where we call
c_parser_maybe_reclassify_token because lookahead might have
classified a token as being a typedef or not based on information from
the wrong scope; do so in this case as well, so resulting in the
expected parse errors from using something that's no longer a typedef
name as if it were a typedef name, and eliminating the ICE.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
Jonathan Wakely [Wed, 27 Nov 2024 11:52:40 +0000 (11:52 +0000)]
libstdc++: Add cold attribute to assertion failure functions [PR117650]
This helps the compiler to split the cold path into a separate clone, so
that the hot path is a smaller function that uses less icache, and the
cold path is only fetched into the icache if actually executed.
Andrew Pinski [Tue, 26 Nov 2024 00:04:21 +0000 (16:04 -0800)]
match: Improve handling of double convert [PR117776]
For a double conversion, we will simplify it into a conversion
with an and if the outer type and inside precision matches and
the intra precision is smaller and unsigned. We should be able
to extend this to where the outer precision is larger too.
This is a good canonicalization too.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/117776
gcc/ChangeLog:
* match.pd (nested int casts): Allow for the case
where the final prec is greater than the original
prec.
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr117776.cc: New test.
* gcc.dg/tree-ssa/cast-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Patrick Palka [Wed, 27 Nov 2024 16:59:38 +0000 (11:59 -0500)]
libstdc++/ranges: make _RangeAdaptorClosure befriend operator|
This declares the range adaptor pipe operators a friend of the
_RangeAdaptorClosure base class so that the std module doesn't need to
export them for ADL to find them.
Note that we deliberately don't define these pipe operators as hidden
friends, see r14-3293-g4a6f3676e7dd9e.
Jakub Jelinek [Wed, 27 Nov 2024 16:29:28 +0000 (17:29 +0100)]
c: Fix sizeof error recovery [PR117745]
Compilation of the following testcase hangs forever after emitting first
error. The problem is that in one place we just return error_mark_node
directly rather than going through c_expr_sizeof_expr or c_expr_sizeof_type.
The parsing of the expression could have called record_maybe_used_decl
though, but nothing calls pop_maybe_used which needs to be called after
parsing of every sizeof/typeof, successful or not.
At the end of the toplevel declaration we free the parser_obstack and in
another function record_maybe_used_decl is called again and due to the
missing pop_maybe_unused we end up with a cycle in the chain.
The following patch fixes it by just setting error and goto to the
sizeof_expr:
c_inhibit_evaluation_warnings--;
in_sizeof--;
mark_exp_read (expr.value);
if (TREE_CODE (expr.value) == COMPONENT_REF
&& DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
error_at (expr_loc, "%<sizeof%> applied to a bit-field");
result = c_expr_sizeof_expr (expr_loc, expr);
where c_expr_sizeof_expr will do:
struct c_expr ret;
if (expr.value == error_mark_node)
{
ret.value = error_mark_node;
ret.original_code = ERROR_MARK;
ret.original_type = NULL;
ret.m_decimal = 0;
pop_maybe_used (false);
}
...
return ret;
which is exactly what the old code did manually except for the missing
pop_maybe_used call. mark_exp_read does nothing on error_mark_node and
error_mark_node doesn't have COMPONENT_REF tree_code.
2024-11-27 Jakub Jelinek <jakub@redhat.com>
PR c/117745
* c-parser.cc (c_parser_sizeof_expression): If type_name is NULL,
just expr.set_error () and goto sizeof_expr instead of doing error
recovery manually.
Pan Li [Fri, 22 Nov 2024 03:48:26 +0000 (11:48 +0800)]
I386: Add more testcases for unsigned SAT_ADD vector pattern
Some forms like below failed to be recognized as a SAT_ADD pattern
for target i386. It is related to some match pattern
extraction but get fixed after the refactor of the SAT_ADD
pattern. Thus, add testcases to ensure we won't have similar
issues in the future.
#define DEF_SAT_ADD(T) \
T sat_add_##T (T x, T y) \
{ \
T res; \
res = x + y; \
res |= -(T)(res < x); \
return res; \
}
#define VEC_DEF_SAT_ADD(T) \
void vec_sat_add(T * restrict a, T * restrict b) \
{ \
for (int i = 0; i < 8; i++) \
b[i] = sat_add_##T (a[i], b[i]); \
}
DEF_SAT_ADD (uint32_t)
VEC_DEF_SAT_ADD (uint32_t)
The below test suites are passed for this patch.
make -k check-gcc RUNTESTFLAGS="--target_board=unix\{,-m32\} i386.exp=pr112600-5a-*.c"
PR target/112600
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112600-5-u16.c: New test.
* gcc.target/i386/pr112600-5-u32.c: New test.
* gcc.target/i386/pr112600-5-u64.c: New test.
* gcc.target/i386/pr112600-5-u8.c: New test.
* gcc.target/i386/pr112600-5.h: New test.
Pan Li [Mon, 25 Nov 2024 01:21:24 +0000 (09:21 +0800)]
Match: Refactor the unsigned SAT_ADD match ADD_OVERFLOW pattern [NFC]
This patch would like to refactor the unsigned SAT_ADD pattern when
leverage the IFN ADD_OVERFLOW, aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of unsigned SAT_ADD match pattern for
IFN ADD_OVERFLOW.
Joseph Myers [Wed, 27 Nov 2024 14:10:37 +0000 (14:10 +0000)]
c: Do not remove _Atomic from array element type for typeof_unqual [PR117781]
As reported in bug 117781, my fix for bug 112841 broke the case of
typeof_unqual applied to an array of _Atomic elements, which should
not have _Atomic removed since only the element type is atomic, not
the array type. Fix with logic to ensure that atomic element types
are preserved as such, while other qualifiers (i.e. those that are
semantically rather than only syntactically such in C) are removed.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/117781
gcc/c/
* c-parser.cc (c_parser_typeof_specifier): Do not remove _Atomic
from array element type for typeof_unqual.
Jakub Jelinek [Wed, 27 Nov 2024 13:33:16 +0000 (14:33 +0100)]
builtins: Emit __sync_lock_release_{8,16} call as last resort instead of doing nothing [PR117642]
As the following testcases show, in case of multi-word
__sync_lock_test_and_set where we don't actually support atomics for that
size (__int128 for x86_64 lp64 with -mno-cx16, long long for ia32 with
-march=i{3,4}86), as the last fallback if we don't know anything else
we just emit calls to __sync_lock_test_and_set_{8,16}. Those aren't defined
in libatomic, but perhaps users could define them themselves.
While __sync_lock_release if it gives up and has no way to emit the atomic
store just does nothing at all, so no clear sign to the users something
went wrong and that the code will not do what they expected.
This regressed when __atomic_* support has been introduced, previously we
would just emit those calls even in this case.
The patch just emits the call as the last fallback.
2024-11-27 Jakub Jelinek <jakub@redhat.com>
PR target/117642
* builtins.cc (expand_builtin_sync_lock_release): Change return type
from void to rtx, return result of expand_atomic_store.
(expand_builtin) <case BUILT_IN_SYNC_LOCK_RELEASE_16>: If
expand_builtin_sync_lock_release returns NULL, do a break rather
than return const0_rtx.
* gcc.target/i386/pr117642-1.c: New test.
* gcc.target/i386/pr117642-2.c: New test.