middle-end: Fix stalled swapped condition code value [PR115836]
emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable. However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.
Move calculation of scode value just before its only usage site to
avoid stalled scode value.
PR middle-end/115836
gcc/ChangeLog:
* expmed.cc (emit_store_flag_1): Move calculation of
scode just before its only usage site.
Jonathan Wakely [Sun, 7 Jul 2024 11:22:42 +0000 (12:22 +0100)]
libstdc++: Fix _Atomic(T) macro in <stdatomic.h> [PR115807]
The definition of the _Atomic(T) macro needs to refer to ::std::atomic,
not some other std::atomic relative to the current namespace.
libstdc++-v3/ChangeLog:
PR libstdc++/115807
* include/c_compatibility/stdatomic.h (_Atomic): Ensure it
refers to std::atomic in the global namespace.
* testsuite/29_atomics/headers/stdatomic.h/115807.cc: New test.
The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.
gcc/ChangeLog/
PR target/103100
* config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
(setmemdi): Likewise.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
strict-align. Cleanup condition for using MOPS.
(aarch64_expand_setmem): Likewise.
AVR: target/87376 - Use nop_general_operand for DImode inputs.
The avr-dimode.md expanders have code like emit_move_insn(acc_a, operands[1])
where acc_a is a hard register and operands[1] might be a non-generic
address-space memory reference. Such loads may clobber hard regs since
some of them are implemented as libgcc calls /and/ 64-moves are
expanded as eight byte-moves, so that acc_a or acc_b might be clobbered
by such a load.
This patch simply denies non-generic address-space references by using
nop_general_operand for all avr-dimode.md input predicates.
With the patch, all memory loads that require library calls are issued
before the expander codes from avr-dimode.md are run.
PR target/87376
gcc/
* config/avr/avr-dimode.md: Use "nop_general_operand" instead
of "general_operand" as predicate for all input operands.
gcc/testsuite/
* gcc.target/avr/torture/pr87376.c: New test.
The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115475
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.
gcc/testsuite/
PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.
The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
<arm_bf16.h> header but GCC doesn't set this up.
LLVM does, so this is an inconsistency between the compilers.
This patch enables that macro for TARGET_BF16_FP.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115457
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.
gcc/testsuite/
PR target/115457
* gcc.target/aarch64/acle/bf16_feature.c: New test.
AVR: target/98762 - Handle partial clobber in movqi output.
PR target/98762
gcc/
* config/avr/avr.cc (avr_out_movqi_r_mr_reg_disp_tiny): Properly
restore the base register when it is partially clobbered.
gcc/testsuite/
* gcc.target/avr/torture/pr98762.c: New test.
Kewen Lin [Wed, 26 Jun 2024 07:16:17 +0000 (02:16 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low short on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs. These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order. So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence. The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE. And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.
Kewen Lin [Wed, 26 Jun 2024 07:16:17 +0000 (02:16 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low char on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs. These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order. So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence. The newly constructed test case
pr106069-1.c is a typical example for this issue.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE. And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.
PR target/88236
PR target/115726
gcc/
* config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a
way that the destination does not overlap with any hard register
clobbered / used by xload8qi_A resp. xload<mode>_A.
* config/avr/avr.cc (avr_out_xload): Avoid early-clobber
situation for Z by executing just one load when the output register
overlaps with Z.
gcc/testsuite/
* gcc.target/avr/torture/pr88236-pr115726.c: New test.
Kewen Lin [Fri, 21 Jun 2024 01:23:56 +0000 (20:23 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low word on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low word, which are altivec_vmrg[hl]w,
vsx_xxmrg[hl]w_<VSX_W:mode>. These defines are mainly for
built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si and some internal gen function
needs. These functions should consider endianness, taking
vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges
the first halves (in element order) of two vectors", it does
note it's in element order. So it's mapped into vmrghw on
BE while vmrglw on LE respectively. Although the mapped
insns are different, as the discussion in PR106069, the RTL
pattern should be still the same, it is conformed before
commit r12-4496, define_expand altivec_vmrghw got expanded
into:
on LE, although the mapped insn are still vmrghw on BE and
vmrglw on LE, the associated RTL pattern is completely
wrong and inconsistent with the mapped insn. If optimization
passes leave this pattern alone, even if its pattern doesn't
represent its mapped insn, it's still fine, that's why simple
testing on bif doesn't expose this issue. But once some
optimization pass such as combine does some changes basing
on this wrong pattern, because the pattern doesn't match the
semantics that the expanded insn is intended to represent,
it would cause the unexpected result.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghw expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghw_direct_<VSX_W:mode>): Rename
to ...
(altivec_vmrghw_direct_<VSX_W:mode>_be): ... this. Add the condition
BYTES_BIG_ENDIAN.
(altivec_vmrghw_direct_<VSX_W:mode>_le): New define_insn.
(altivec_vmrglw_direct_<VSX_W:mode>): Rename to ...
(altivec_vmrglw_direct_<VSX_W:mode>_be): ... this. Add the condition
BYTES_BIG_ENDIAN.
(altivec_vmrglw_direct_<VSX_W:mode>_le): New define_insn.
(altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be
for BE and gen_altivec_vmrglw_direct_v4si_le for LE.
(altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be
for BE and gen_altivec_vmrghw_direct_v4si_le for LE.
(vec_widen_umult_hi_v8hi): Adjust the call to
gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE
and by gen_altivec_vmrglw for LE.
(vec_widen_smult_hi_v8hi): Likewise.
(vec_widen_umult_lo_v8hi): Adjust the call to
gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE
and by gen_altivec_vmrghw for LE
(vec_widen_smult_lo_v8hi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghw_direct_v4si by
CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and
CODE_FOR_altivec_vmrghw_direct_v4si_le for LE. And replace
CODE_FOR_altivec_vmrglw_direct_v4si by
CODE_FOR_altivec_vmrglw_direct_v4si_be for BE and
CODE_FOR_altivec_vmrglw_direct_v4si_le for LE.
* config/rs6000/vsx.md (vsx_xxmrghw_<VSX_W:mode>): Adjust by calling
gen_altivec_vmrghw_direct_v4si_be for BE and
gen_altivec_vmrglw_direct_v4si_le for LE.
(vsx_xxmrglw_<VSX_W:mode>): Adjust by calling
gen_altivec_vmrglw_direct_v4si_be for BE and
gen_altivec_vmrghw_direct_v4si_le for LE.
gcc/testsuite/ChangeLog:
* g++.target/powerpc/pr106069.C: New test.
* gcc.target/powerpc/pr115355.c: New test.
Alexandre Oliva [Thu, 27 Jun 2024 11:44:54 +0000 (08:44 -0300)]
[libstdc++] [testsuite] defer to check_vect_support* [PR115454]
The newly-added testcase overrides the default dg-do action set by
check_vect_support_and_set_flags (in libstdc++-dg/conformance.exp), so
it attempts to run the test even if runtime vector support is not
available.
Remove the explicit dg-do directive, so that the default is honored,
and the test is run if vector support is found, and only compiled
otherwise.
for libstdc++-v3/ChangeLog
PR libstdc++/115454
* testsuite/experimental/simd/pr115454_find_last_set.cc: Defer
to check_vect_support_and_set_flags's default dg-do action.
Kyrylo Tkachov [Wed, 19 Jun 2024 09:26:02 +0000 (14:56 +0530)]
Add support for -mcpu=grace
This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.
This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
Jonathan Wakely [Tue, 25 Jun 2024 22:25:54 +0000 (23:25 +0100)]
libstdc++: Remove confusing text from status tables for release branch
When I tried to make the release branch versions of these docs refer to
the release branch instead of "mainline GCC", for some reason I left the
text "not any particular release" there. That's just confusing, because
the docs are for a particular release, the latest on that branch. Remove
that confusing text in several places.
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2023.xml: Change reference from
mainline GCC to the release branch.
* doc/xml/manual/status_cxx1998.xml: Remove confusing "not in
any particular release" text.
* doc/xml/manual/status_cxx2011.xml: Likewise.
* doc/xml/manual/status_cxx2014.xml: Likewise.
* doc/xml/manual/status_cxx2017.xml: Likewise.
* doc/xml/manual/status_cxx2020.xml: Likewise.
* doc/xml/manual/status_cxxtr1.xml: Likewise.
* doc/xml/manual/status_cxxtr24733.xml: Likewise.
* doc/html/manual/status.html: Regenerate.
Kewen Lin [Wed, 29 May 2024 02:13:40 +0000 (21:13 -0500)]
rs6000: Don't clobber return value when eh_return called [PR114846]
As the associated test case in PR114846 shows, currently
with eh_return involved some register restoring for EH
RETURN DATA in epilogue can clobber the one which holding
the return value. Referring to the existing handlings in
some other targets, this patch makes eh_return expander
call one new define_insn_and_split eh_return_internal which
directly calls rs6000_emit_epilogue with epilogue_type
EPILOGUE_TYPE_EH_RETURN instead of the previous treating
normal return with crtl->calls_eh_return specially.
PR target/114846
gcc/ChangeLog:
* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
now, adjust the relevant handlings on it.
* config/rs6000/rs6000.md (eh_return expander): Append by calling
gen_eh_return_internal and emit_barrier.
(eh_return_internal): New define_insn_and_split, call function
rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
Matthias Kretz [Fri, 21 Jun 2024 14:22:22 +0000 (16:22 +0200)]
libstdc++: Fix test on x86_64 and non-simd targets
* Running a test compiled with AVX512 instructions requires
avx512f_runtime not just avx512f.
* The 'reduce2' test violated an invariant of fixed_size_simd_mask and
thus failed on all targets without 16-Byte vector builtins enabled (in
bits/simd.h).
Matthias Kretz [Fri, 14 Jun 2024 13:11:25 +0000 (15:11 +0200)]
libstdc++: Fix find_last_set(simd_mask) to ignore padding bits
With the change to the AVX512 find_last_set implementation, the change
to AVX512 operator!= is unnecessary. However, the latter was not
producing optimal code and unnecessarily set the padding bits. In
theory, the compiler could determine that with the new !=
implementation, the bit operation for clearing the padding bits is a
no-op and can be elided.
PR libstdc++/115454
* include/experimental/bits/simd_x86.h (_S_not_equal_to): Use
neq comparison instead of bitwise negation after eq.
(_S_find_last_set): Clear unused high bits before computing
bit_width.
* testsuite/experimental/simd/pr115454_find_last_set.cc: New
test.
Matthias Kretz [Mon, 3 Jun 2024 10:02:07 +0000 (12:02 +0200)]
libstdc++: Fix simd<char> conversion for -fno-signed-char for Clang
The special case for Clang in the trait producing a signed integer type
lead to the trait returning 'char' where it should have been 'signed
char'. This workaround was introduced because on Clang the return type
of vector compares was not convertible to '_SimdWrapper<
__int_for_sizeof_t<...' unless '__int_for_sizeof_t<char>' was an alias
for 'char'. In order to not rewrite the complete mask type code (there
is code scattered around the implementation assuming signed integers),
this needs to be 'signed char'; so the special case for Clang needs to
be removed.
The conversion issue is now solved in _SimdWrapper, which now
additionally allows conversion from vector types with compatible
integral type.
PR libstdc++/115308
* include/experimental/bits/simd.h (__int_for_sizeof): Remove
special cases for __clang__.
(_SimdWrapper): Change constructor overload set to allow
conversion from vector types with integral conversions via bit
reinterpretation.
PR libstdc++/115247
* include/experimental/bits/simd.h (__as_vector): Don't use
vector_size(8) on __i386__.
(__vec_shuffle): Never return MMX vectors, widen to 16 bytes
instead.
(concat): Fix padding calculation to pick up widening logic from
__as_vector.
PR libstdc++/114958
* include/experimental/bits/simd.h (__as_vector): Return scalar
simd as one-element vector. Return vector from single-vector
fixed_size simd.
(__vec_shuffle): New.
(__extract_part): Adjust return type signature.
(split): Use __extract_part for any split into non-fixed_size
simds.
(concat): If the return type stores a single vector, use
__vec_shuffle (which calls __builtin_shufflevector) to produce
the return value.
* include/experimental/bits/simd_builtin.h
(__shift_elements_right): Removed.
(__extract_part): Return single elements directly. Use
__vec_shuffle (which calls __builtin_shufflevector) to for all
non-trivial cases.
* include/experimental/bits/simd_fixed_size.h (__extract_part):
Return single elements directly.
* testsuite/experimental/simd/pr114958.cc: New test.
The option_map array for most entries contains just non-NULL opt0
{ "-Wno-", NULL, "-W", false, true },
{ "-fno-", NULL, "-f", false, true },
{ "-gno-", NULL, "-g", false, true },
{ "-mno-", NULL, "-m", false, true },
{ "--debug=", NULL, "-g", false, false },
{ "--machine-", NULL, "-m", true, false },
{ "--machine-no-", NULL, "-m", false, true },
{ "--machine=", NULL, "-m", false, false },
{ "--machine=no-", NULL, "-m", false, true },
{ "--machine", "", "-m", false, false },
{ "--machine", "no-", "-m", false, true },
{ "--optimize=", NULL, "-O", false, false },
{ "--std=", NULL, "-std=", false, false },
{ "--std", "", "-std=", false, false },
{ "--warn-", NULL, "-W", true, false },
{ "--warn-no-", NULL, "-W", false, true },
{ "--", NULL, "-f", true, false },
{ "--no-", NULL, "-f", false, true }
and so add_misspelling_candidates works correctly for it, but 3 out of
these,
{ "--machine", "", "-m", false, false },
{ "--machine", "no-", "-m", false, true },
and
{ "--std", "", "-std=", false, false },
use non-NULL opt1. That says that
--machine foo
should map to
-mfoo
and
--machine no-foo
should map to
-mno-foo
and
--std c++17
should map to
-std=c++17
add_misspelling_canidates was not handling this, so it hapilly
registered say
--stdc++17
or
--machineavx512
(twice) as spelling alternatives, when those options aren't recognized.
Instead we support
--std c++17
or
--machine avx512
--machine no-avx512
The following patch fixes that. On this particular testcase, we no longer
suggest anything, even when among the suggestion is say that
--std c++17
or
-std=c++17
etc.
2024-06-17 Jakub Jelinek <jakub@redhat.com>
PR driver/115440
* opts-common.cc (add_misspelling_candidates): If opt1 is non-NULL,
add a space and opt1 to the alternative suggestion text.
The warning code uses %D to print the ARRAY_REF first operands.
That works in the most common case where those operands are decls, but
as can be seen on the following testcase, they can be other expressions
with array type.
Just changing %D to %E isn't enough, because then the diagnostics can
suggest something like
note: use '&(x) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0] == &(y) != 0 ? (int (*)[32])&a : (int (*)[32])&b[0]' to compare the addresses
which is a bad suggestion, the %E printing doesn't know that the
warning code will want to add & before it and [0] after it.
So, the following patch adds ()s around the operand as well, but does
that only for non-decls, for decls keeps it as &arr[0] like before.
2024-06-17 Jakub Jelinek <jakub@redhat.com>
PR c/115290
* c-warn.cc (do_warn_array_compare): Use %E rather than %D for
printing op0 and op1; if those operands aren't decls, also print
parens around them.
Andre Vieira [Thu, 6 Jun 2024 15:02:50 +0000 (16:02 +0100)]
arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]
This patch adds missing assembly directives to the CMSE library wrapper to call
functions with attribute cmse_nonsecure_call. Without the .type directive the
linker will fail to produce the correct veneer if a call to this wrapper
function is to far from the wrapper itself. The .size was added for
completeness, though we don't necessarily have a usecase for it.
libgcc/ChangeLog:
PR target/115360
* config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
Alex Coplan [Fri, 3 May 2024 08:23:59 +0000 (09:23 +0100)]
cfgrtl: Fix MEM_EXPR update in duplicate_insn_chain [PR114924]
The PR shows that when cfgrtl.cc:duplicate_insn_chain attempts to
update the MR_DEPENDENCE_CLIQUE information for a MEM_EXPR we can end up
accidentally dropping (e.g.) an ARRAY_REF from the MEM_EXPR and end up
replacing it with the underlying MEM_REF. This leads to an
inconsistency in the MEM_EXPR information, and could lead to wrong code.
While the walk down to the MEM_REF is necessary to update
MR_DEPENDENCE_CLIQUE, we should use the outer tree expression for the
MEM_EXPR. This patch does that.
gcc/ChangeLog:
PR rtl-optimization/114924
* cfgrtl.cc (duplicate_insn_chain): When updating MEM_EXPRs,
don't strip (e.g.) ARRAY_REFs from the final MEM_EXPR.
When we substitute the equivalence and it becomes shared, we can fail
to correctly update reg info used by LRA. This can result in wrong
code generation, e.g. because of incorrect live analysis. It can also
result in compiler crash as the pseudo survives RA. This is what
exactly happened for the PR. This patch solves this problem by
unsharing substituted equivalences.
The following fixes an issue where SSA update loses PHI argument
locations when updating PHI nodes it didn't create as part of the
SSA update. For the case where the reaching def is the same as
the current argument opt to do nothing and for the case where the
PHI argument already has a location keep that (that's an indication
the PHI node wasn't created as part of the update SSA process).
PR middle-end/40635
* tree-into-ssa.cc (rewrite_update_phi_arguments): Only
update the argument when the reaching definition is different
from the current argument. Keep an existing argument
location.
The following tries to address the PHI insertion compile-time hog in
RTL fwprop observed with the PR54052 testcase where the loop computing
the "unfiltered" set of variables possibly needing PHI nodes for each
block exhibits quadratic compile-time and memory-use.
It does so by pruning the local DEFs with LR_OUT of the block, removing
regs that can never be LR_IN (defined by this block) in the dominance
frontier.
PR rtl-optimization/54052
* rtl-ssa/blocks.cc (function_info::place_phis): Filter
local defs by LR_OUT.
arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]
Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is an internal compiler error on Cortex-M23 for the
epilog processing of sign extension.
This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.
Include safe-ctype.h after C++ standard headers, to avoid over-poisoning
When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as <list>, <map>, etc, causes many compilation
errors, similar to:
In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
In file included from /home/dim/src/gcc/master/gcc/system.h:233:
In file included from /usr/include/c++/v1/vector:321:
In file included from
/usr/include/c++/v1/__format/formatter_bool.h:20:
In file included from
/usr/include/c++/v1/__format/formatter_integral.h:32:
In file included from /usr/include/c++/v1/locale:202:
/usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
only applies to structs, variables, functions, and namespaces
546 | _LIBCPP_INLINE_VISIBILITY
| ^
/usr/include/c++/v1/__config:813:37: note: expanded from macro
'_LIBCPP_INLINE_VISIBILITY'
813 | # define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
| ^
/usr/include/c++/v1/__config:792:26: note: expanded from macro
'_LIBCPP_HIDE_FROM_ABI'
792 |
__attribute__((__abi_tag__(_LIBCPP_TOSTRING(
_LIBCPP_VERSIONED_IDENTIFIER))))
| ^
In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
In file included from /home/dim/src/gcc/master/gcc/system.h:233:
In file included from /usr/include/c++/v1/vector:321:
In file included from
/usr/include/c++/v1/__format/formatter_bool.h:20:
In file included from
/usr/include/c++/v1/__format/formatter_integral.h:32:
In file included from /usr/include/c++/v1/locale:202:
/usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
declaration list
547 | char_type toupper(char_type __c) const
| ^
/usr/include/c++/v1/__locale:553:48: error: too many arguments
provided to function-like macro invocation
553 | const char_type* toupper(char_type* __low, const
char_type* __high) const
| ^
/home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
macro 'toupper' defined here
146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
| ^
This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via <locale>).
There was already a special case for including <string> before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.
PR middle-end/111632
gcc/ChangeLog:
* system.h: Include safe-ctype.h after C++ standard headers.
Andrew Pinski [Sat, 18 May 2024 18:55:58 +0000 (11:55 -0700)]
PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]
The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.
Bootstrapped and tested on x86_64_linux-gnu with no regressions.
PR tree-optimization/115143
gcc/ChangeLog:
* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.
Jonathan Wakely [Fri, 10 Feb 2023 23:16:15 +0000 (23:16 +0000)]
libstdc++: Fix uses of non-reserved names in headers
The non-reserved names 'val' and 'dest' were being used in our headers
but haven't been added to the 17_intro/names.cc test. That's because
they are used by <asm-generic/posix_types.h> and <netinet/tcp.h>
respecitvely on glibc-based systems.
Jonathan Wakely [Thu, 1 Feb 2024 10:08:05 +0000 (10:08 +0000)]
libstdc++: Implement some missing functions for net::ip::network_v6
libstdc++-v3/ChangeLog:
* include/experimental/internet (network_v6::network): Define.
(network_v6::hosts): Finish implementing.
(network_v6::to_string): Do not concatenate std::string to
arbitrary std::basic_string specialization.
* testsuite/experimental/net/internet/network/v6/cons.cc: New
test.
Jonathan Wakely [Fri, 9 Jun 2023 11:15:21 +0000 (12:15 +0100)]
libstdc++: Add preprocessor checks to <experimental/internet> [PR100285]
We can't define endpoints and resolvers without the relevant OS support.
If IPPROTO_TCP and IPPROTO_UDP are both undefined then we won't need
basic_endpoint and basic_resolver anyway, so make them depend on those
macros.
libstdc++-v3/ChangeLog:
PR libstdc++/100285
* include/experimental/internet [IPPROTO_TCP || IPPROTO_UDP]
(basic_endpoint, basic_resolver_entry, resolver_base)
(basic_resolver_results, basic_resolver): Only define if the tcp
or udp protocols will be defined.
Jonathan Wakely [Wed, 10 Apr 2024 12:24:51 +0000 (13:24 +0100)]
libstdc++: Adjust expected locale-dependent date formats in tests
The test 27_io/manipulators/extended/get_time/char/2.cc expects that
%a in the de_DE locale is "Di" but on FreeBSD it's "Di." with a trailing
period. Adjust the input string to be "1971 Di." instead of "Di 1971"
and that way if %a doesn't expect the trailing '.' it simply won't
extract it from the stream.
This fixes:
FAIL: 27_io/manipulators/extended/get_time/char/2.cc -std=gnu++17 execution test
libstdc++-v3/ChangeLog:
* testsuite/27_io/manipulators/extended/get_time/char/2.cc:
Adjust input string so that it matches %a with or without a
trailing period.
Jonathan Wakely [Mon, 18 Mar 2024 13:22:17 +0000 (13:22 +0000)]
libstdc++: Fix infinite loop in std::binomial_distribution [PR114359]
The multiplication (4 * _M_t * __1p) can wraparound to zero if _M_t is
unsigned and 4 * _M_t wraps to zero. The third operand has type double,
so do the second multiplication first, so that we aren't multiplying
integers.
libstdc++-v3/ChangeLog:
PR libstdc++/114359
* include/bits/random.tcc (binomial_distribution::param_type):
Ensure arithmetic is done as type double.
* testsuite/26_numerics/random/binomial_distribution/114359.cc: New test.
Jonathan Wakely [Mon, 18 Mar 2024 13:00:17 +0000 (13:00 +0000)]
libstdc++: Begin lifetime of storage in std::vector<bool> [PR114367]
This doesn't cause a problem with GCC, but Clang correctly diagnoses a
bug in the code. The objects in the allocated storage need to begin
their lifetime before we start using them.
This change uses the allocator's construct function instead of using
std::construct_at directly, in order to support fancy pointers.
libstdc++-v3/ChangeLog:
PR libstdc++/114367
* include/bits/stl_bvector.h (_M_allocate): Use allocator's
construct function to begin lifetime of words.
Jonathan Wakely [Thu, 21 Mar 2024 13:25:15 +0000 (13:25 +0000)]
libstdc++: Destroy allocators in re-inserted container nodes [PR114401]
The allocator objects in container node handles were not being destroyed
after the node was re-inserted into a container. They are stored in a
union and so need to be explicitly destroyed when the node becomes
empty. The containers were zeroing the node handle's pointer, which
makes it empty, causing the handle's destructor to think there's nothing
to clean up.
Add a new member function to the node handle which destroys the
allocator and zeros the pointer. Change the containers to call that
instead of just changing the pointer manually.
We can also remove the _M_empty member of the union which is not
necessary.
libstdc++-v3/ChangeLog:
PR libstdc++/114401
* include/bits/hashtable.h (_Hashtable::_M_reinsert_node): Call
release() on node handle instead of just zeroing its pointer.
(_Hashtable::_M_reinsert_node_multi): Likewise.
(_Hashtable::_M_merge_unique): Likewise.
(_Hashtable::_M_merge_multi): Likewise.
* include/bits/node_handle.h (_Node_handle_common::release()):
New member function.
(_Node_handle_common::_Optional_alloc::_M_empty): Remove
unnecessary union member.
(_Node_handle_common): Declare _Hashtable as a friend.
* include/bits/stl_tree.h (_Rb_tree::_M_reinsert_node_unique):
Call release() on node handle instead of just zeroing its
pointer.
(_Rb_tree::_M_reinsert_node_equal): Likewise.
(_Rb_tree::_M_reinsert_node_hint_unique): Likewise.
(_Rb_tree::_M_reinsert_node_hint_equal): Likewise.
* testsuite/23_containers/multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/set/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_set/modifiers/114401.cc: New test.
Jakub Jelinek [Thu, 6 Jun 2024 20:12:11 +0000 (22:12 +0200)]
c: Fix up pointer types to may_alias structures [PR114493]
The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
gcc_assert (TYPE_CANONICAL (t2) != t2
&& TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.
This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.
The following patch does that in the C FE as well.
2024-06-06 Jakub Jelinek <jakub@redhat.com>
PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.
* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.
Jakub Jelinek [Tue, 4 Jun 2024 13:49:41 +0000 (15:49 +0200)]
fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]
The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result. That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).
The following patch does that.
2024-06-04 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) <CASE_CFN_CLZ>:
If fn is CFN_CLZ, use CLZ_DEFINED_VALUE_AT.
Jakub Jelinek [Tue, 4 Jun 2024 10:28:01 +0000 (12:28 +0200)]
builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow [PR108789]
The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR. Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).
2024-06-04 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
Jakub Jelinek [Mon, 3 Jun 2024 21:11:06 +0000 (23:11 +0200)]
rs6000: Fix up PCH in --enable-host-pie builds [PR115324]
PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or &rs6000_instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
{
&rs6000_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
{
&rs6000_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
},
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full. While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.
So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.
where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.
Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.
When I applied
void
rs6000_init_generated_builtins ()
{
+ bifdata *rs6000_builtin_info_p;
+ tree *rs6000_builtin_info_fntype_p;
+ ovlddata *rs6000_instance_info_p;
+ tree *rs6000_instance_info_fntype_p;
+ ovldrecord *rs6000_overload_info_p;
+ __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+ __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" (rs6000_builtin_info_fntype));
+ __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+ __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" (rs6000_instance_info_fntype));
+ __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+ #define rs6000_builtin_info rs6000_builtin_info_p
+ #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+ #define rs6000_instance_info rs6000_instance_info_p
+ #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+ #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments. If you want such a change,
could that be done incrementally?
2024-06-03 Jakub Jelinek <jakub@redhat.com>
PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members. Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *. Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_info_fntype and rs6000_instance_info_fntype arrays,
which have GTY markup.
(write_bif_static_init): Adjust for the above changes.
(write_ovld_static_init): Likewise.
(write_init_bif_table): Likewise.
(write_init_ovld_table): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Likewise.
* config/rs6000/rs6000-c.cc (find_instance): Likewise. Make static.
(altivec_resolve_overloaded_builtin): Adjust for the above changes.
Jakub Jelinek [Wed, 15 May 2024 16:37:17 +0000 (18:37 +0200)]
combine: Fix up simplify_compare_const [PR115092]
The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers. That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).
The following patch just disables the optimization in that case.
2024-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.
* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.
Jakub Jelinek [Tue, 7 May 2024 19:29:14 +0000 (21:29 +0200)]
tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]
In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.
This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.
Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.
2024-05-07 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has asan/hwasan
sanitization disabled.
Jakub Jelinek [Tue, 30 Apr 2024 09:22:32 +0000 (11:22 +0200)]
gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument [PR114876]
Seems when Martin S. implemented this, he coded there strict reading
of the standard, which said that %lc with (wint_t) 0 argument is handled
as wchar_t[2] temp = { arg, 0 }; %ls with temp arg and so shouldn't print
any values. But, most of the libc implementations actually handled that
case like %c with '\0' argument, adding a single NUL character, the only
known exception is musl.
Recently, C23 changed this in response to GB-141 and POSIX in
https://austingroupbugs.net/view.php?id=1647
so that it should have the same behavior as %c with '\0'.
Because there is implementation divergence, the following patch uses
a range rather than hardcoding it to all 1s (i.e. the %c behavior),
though the likely case is still 1 (forward looking plus most of
implementations).
The res.knownrange = true; assignment removed is redundant due to
the same assignment done unconditionally before the if statement,
rest is formatting fixes.
I don't think the min >= 0 && min < 128 case is right either, I'd think
it should be min >= 0 && max < 128, otherwise it is just some possible
inputs are (maybe) ASCII and there can be others, but this code is a total
mess anyway, with the min, max, likely (somewhere in [min, max]?) and then
unlikely possibly larger than max, dunno, perhaps for at least some chars
in the ASCII range the likely case could be for the ascii case; so perhaps
just the one_2_one_ascii shouldn't set max to 1 and mayfail should be true
for max >= 128. Anyway, didn't feel I should touch that right now.
2024-04-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114876
* gimple-ssa-sprintf.cc (format_character): For min == 0 && max == 0,
set max, likely and unlikely members to 1 rather than 0. Remove
useless res.knownrange = true;. Formatting fixes.
* gcc.dg/pr114876.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust expected
diagnostics.
Jakub Jelinek [Thu, 25 Apr 2024 18:09:35 +0000 (20:09 +0200)]
openmp: Copy DECL_LANG_SPECIFIC and DECL_LANG_FLAG_? to tree-nested decl copy [PR114825]
tree-nested.cc creates in 2 spots artificial VAR_DECLs, one of them is used
both for debug info and OpenMP/OpenACC lowering purposes, the other solely for
OpenMP/OpenACC lowering purposes.
When the decls are used in OpenMP/OpenACC lowering, the OMP langhooks (mostly
Fortran, C just a little and C++ doesn't have nested functions) then inspect
the flags on the vars and based on that decide how to lower the corresponding
clauses.
Unfortunately we weren't copying DECL_LANG_SPECIFIC and DECL_LANG_FLAG_?, so
the langhooks made decisions on the default flags on those instead.
As the original decl isn't necessarily a VAR_DECL, could be e.g. PARM_DECL,
using copy_node wouldn't work properly, so this patch just copies those
flags in addition to other flags it was copying already. And I've removed
code duplication by introducing a helper function which does copying common
to both uses.
2024-04-25 Jakub Jelinek <jakub@redhat.com>
PR fortran/114825
* tree-nested.cc (get_debug_decl): New function.
(get_nonlocal_debug_decl): Use it.
(get_local_debug_decl): Likewise.
Jakub Jelinek [Fri, 19 Apr 2024 06:47:53 +0000 (08:47 +0200)]
rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]
On the following testcase, combine propagates the mem/v load into mem store
with the same address and then removes it, because noop_move_p says it is a
no-op move. If it was the other way around, i.e. mem/v store and mem load,
or both would be mem/v, it would be kept.
The problem is that rtx_equal_p never checks any kind of flags on the rtxes
(and I think it would be quite dangerous to change it at this point), and
set_noop_p checks side_effects_p on just one of the operands, not both.
In the MEM <- MEM set, it only checks it on the destination, in
store to ZERO_EXTRACT only checks it on the source.
The following patch adds the missing side_effects_p checks.
2024-04-19 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114768
* rtlanal.cc (set_noop_p): Don't return true for MEM <- MEM
sets if src has side-effects or for stores into ZERO_EXTRACT
if ZERO_EXTRACT operand has side-effects.
Jakub Jelinek [Thu, 18 Apr 2024 07:45:14 +0000 (09:45 +0200)]
internal-fn: Temporarily disable flag_trapv during .{ADD,SUB,MUL}_OVERFLOW etc. expansion [PR114753]
__builtin_{add,sub,mul}_overflow{,_p} builtins are well defined
for all inputs even for -ftrapv, and the -fsanitize=signed-integer-overflow
ifns shouldn't abort in libgcc but emit the desired ubsan diagnostics
or abort depending on -fsanitize* setting regardless of -ftrapv.
The expansion of these internal functions uses expand_expr* in various
places (e.g. MULT_EXPR at least in 2 spots), so temporarily disabling
flag_trapv in all those spots would be hard.
The following patch disables it around the bodies of 3 functions
which can do the expand_expr calls.
If it was in the C++ FE, I'd use some RAII sentinel, but I don't think
we have one in the middle-end.
2024-04-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114753
* internal-fn.cc (expand_mul_overflow): Save flag_trapv and
temporarily clear it for the duration of the function, then
restore previous value.
(expand_vector_ubsan_overflow): Likewise.
(expand_arith_overflow): Likewise.
Jakub Jelinek [Mon, 15 Apr 2024 08:25:22 +0000 (10:25 +0200)]
attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]
The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
assumes that all decls must have types.
I think it is better in something as unimportant as diag_attr_exclusions
to be more robust, if there is no type, it can just diagnose exclusions
on the DECL_ATTRIBUTES, like for types it only diagnoses it on
TYPE_ATTRIBUTES.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR c++/114634
* attribs.cc (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
decls with NULL TREE_TYPE.
Jakub Jelinek [Fri, 12 Apr 2024 18:53:10 +0000 (20:53 +0200)]
c++: Fix bogus warnings about ignored annotations [PR114691]
The middle-end warns about the ANNOTATE_EXPR added for while/for loops
if they declare a var inside of the loop condition.
This is because the assumption is that ANNOTATE_EXPR argument is used
immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond
wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer
holds.
The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the
ANNOTATE_EXPR argument if any.
2024-04-12 Jakub Jelinek <jakub@redhat.com>
PR c++/114691
* semantics.cc (simplify_loop_decl_cond): Use cp_build_unary_op with
TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than
ANNOTATE_EXPR itself.
Jakub Jelinek [Thu, 11 Apr 2024 09:12:11 +0000 (11:12 +0200)]
asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector* [PR110027]
On Tue, Mar 26, 2024 at 02:08:02PM +0800, liuhongt wrote:
> > > So, try to add some other variable with larger size and smaller alignment
> > > to the frame (and make sure it isn't optimized away).
> > >
> > > alignb above is the alignment of the first partition's var, if
> > > align_frame_offset really needs to depend on the var alignment, it probably
> > > should be the maximum alignment of all the vars with alignment
> > > alignb * BITS_PER_UNIT <=3D MAX_SUPPORTED_STACK_ALIGNMENT
> > >
>
> In asan_emit_stack_protection, when it allocated fake stack, it assume
> bottom of stack is also aligned to alignb. And the place violated this
> is the first var partition. which is 32 bytes offsets, it should be
> BIGGEST_ALIGNMENT / BITS_PER_UNIT.
> So I think we need to use MAX (BIGGEST_ALIGNMENT /
> BITS_PER_UNIT, ASAN_RED_ZONE_SIZE) for the first var partition.
Your first patch aligned offsets[0] to maximum of alignb and
ASAN_RED_ZONE_SIZE. But as I wrote in the reply to that mail, alignb there
is the alignment of just a single variable which is the first one to appear
in the sorted list and is placed in the highest spot in the stack frame.
That is not necessarily the largest alignment, the sorting ensures that it
is a variable with the largest size in the frame (and only if several of
them have equal size, largest alignment from the same sized ones). Your
second patch used maximum of BIGGEST_ALIGNMENT / BITS_PER_UNIT and
ASAN_RED_ZONE_SIZE. That doesn't change anything at all when using
-mno-avx512f - offsets[0] is still just 32-byte aligned in that case
relative to top of frame, just changes the -mavx512f case to be 64-byte
aligned offsets[0] (aka offsets[0] is then either 0 or -64 instead of either
0 or -32). That will not help if any variable in the frame needs 128-byte,
256-byte, 512-byte ... 4096-byte alignment. If you want to fix the bug in
the spot you've touched, you'd need to walk all the
stack_vars[stack_vars_sorted[si2]] for si2 [si + 1, n - 1] and for those
where the loop would do anything (i.e.
stack_vars[i2].representative == i2
&& TREE_CODE (decl2) == SSA_NAME
? SA.partition_to_pseudo[var_to_partition (SA.map, decl2)] == NULL_RTX
: DECL_RTL (decl2) == pc_rtx
and the pred applies (but that means also walking the earlier ones!
because with -fstack-protector* the vars can be processed in several calls) and
alignb2 * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT
and compute maximum of those alignments.
That maximum is already computed,
data->asan_alignb = MAX (data->asan_alignb, alignb);
computes that, but you get the final result only after you do all the
expand_stack_vars calls. You'd need to compute it before.
Though, that change would be still in the wrong place.
The thing is, it would be a waste of the precious stack space when it isn't
needed at all (e.g. when asan will not at compile time do the use after
return checking, or if it won't do it at runtime, or even if it will do at
runtime it will waste the space on the stack).
The following patch fixes it solely for the __asan_stack_malloc_N
allocations, doesn't enlarge unnecessarily further the actual stack frame.
Because asan is only supported on FRAME_GROWS_DOWNWARD architectures
(mips, rs6000 and xtensa are conditional FRAME_GROWS_DOWNWARD arches, which
for -fsanitize=address or -fstack-protector* use FRAME_GROWS_DOWNWARD 1,
otherwise 0, others supporting asan always just use 1), the assumption for
the dynamic stack realignment is that the top of the stack frame (aka offset
0) is aligned to alignb passed to the function (which is the maximum of alignb
of all the vars in the frame). As checked by the assertion in the patch,
offsets[0] is 0 most of the time and so that assumption is correct, the only
case when it is not 0 is if -fstack-protector* is on together with
-fsanitize=address and cfgexpand.cc (create_stack_guard) created a stack
guard. That is the only variable which is allocated in the stack frame
right away, for all others with -fsanitize=address defer_stack_allocation
(or -fstack-protector*) returns true and so they aren't allocated
immediately but handled during the frame layout phases. So, the original
frame_offset of 0 is changed because of the stack guard to
-pointer_size_in_bytes and later at the
if (data->asan_vec.is_empty ())
{
align_frame_offset (ASAN_RED_ZONE_SIZE);
prev_offset = frame_offset.to_constant ();
}
to -ASAN_RED_ZONE_SIZE. The asan_emit_stack_protection code wasn't
taking this into account though, so essentially assumed in the
__asan_stack_malloc_N allocated memory it needs to align it such that
pointer corresponding to offsets[0] is alignb aligned. But that isn't
correct if alignb > ASAN_RED_ZONE_SIZE, in that case it needs to ensure that
pointer corresponding to frame offset 0 is alignb aligned.
The following patch fixes that. Unlike the previous case where
we knew that asan_frame_size + base_align_bias falls into the same bucket
as asan_frame_size, this isn't in some cases true anymore, so the patch
recomputes which bucket to use and if going to bucket 11 (because there is
no __asan_stack_malloc_11 function in the library) disables the after return
sanitization.
2024-04-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/110027
* asan.cc (asan_emit_stack_protection): Assert offsets[0] is
zero if there is no stack protect guard, otherwise
-ASAN_RED_ZONE_SIZE. If alignb > ASAN_RED_ZONE_SIZE and there is
stack pointer guard, take the ASAN_RED_ZONE_SIZE bytes allocated at
the top of the stack into account when computing base_align_bias.
Recompute use_after_return_class from asan_frame_size + base_align_bias
and set to -1 if that would overflow to 11.
Jakub Jelinek [Tue, 9 Apr 2024 07:31:42 +0000 (09:31 +0200)]
c++: Fix up maybe_warn_for_constant_evaluated calls [PR114580]
When looking at maybe_warn_for_constant_evaluated for the trivial
infinite loops patch, I've noticed that it can emit weird diagnostics
for if constexpr in templates, first warn that std::is_constant_evaluted()
always evaluates to false (because the function template is not constexpr)
and then during instantiation warn that std::is_constant_evaluted()
always evaluates to true (because it is used in if constexpr condition).
Now, only the latter is actually true, even when the if constexpr
is in a non-constexpr function, it will still always evaluate to true.
So, the following patch fixes it to call maybe_warn_for_constant_evaluated
always with IF_STMT_CONSTEXPR_P (if_stmt) as the second argument rather than
true if it is if constexpr with non-dependent condition etc.
2024-04-09 Jakub Jelinek <jakub@redhat.com>
PR c++/114580
* semantics.cc (finish_if_stmt_cond): Call
maybe_warn_for_constant_evaluated with IF_STMT_CONSTEXPR_P (if_stmt)
as the second argument, rather than true/false depending on if
it is if constexpr with non-dependent constant expression with
bool type.
* g++.dg/cpp2a/is-constant-evaluated15.C: New test.
Jakub Jelinek [Fri, 5 Apr 2024 12:56:14 +0000 (14:56 +0200)]
vect: Don't clear base_misaligned in update_epilogue_loop_vinfo [PR114566]
The following testcase is miscompiled, because in the vectorized
epilogue the vectorizer assumes it can use aligned loads/stores
(if the base decl gets alignment increased), but it actually doesn't
increase that.
This is because r10-4203-g97c1460367 added the hunk following
patch removes. The explanation feels reasonable, but actually it
is not true as the testcase proves.
The thing is, we vectorize the main loop with 64-byte vectors
and the corresponding data refs have base_alignment 16 (the
a array has DECL_ALIGN 128) and offset_alignment 32. Now, because
of the offset_alignment 32 rather than 64, we need to use unaligned
loads/stores in the main loop (and ditto in the first load/store
in vectorized epilogue). But the second load/store in the vectorized
epilogue uses only 32-byte vectors and because it is a multiple
of offset_alignment, it checks if we could increase alignment of the
a VAR_DECL, the function returns true, sets base_misaligned = true
and says the access is then aligned.
But when update_epilogue_loop_vinfo clears base_misaligned with the
assumption that the var had to have the alignment increased already,
the update of DECL_ALIGN doesn't happen anymore.
Now, I'd think this base_alignment = false was needed before r10-4030-gd2db7f7901 change was committed where it incorrectly
overwrote DECL_ALIGN even if it was already larger, rather than
just always increasing it. But with that change in, it doesn't
make sense to me anymore.
Note, the testcase is latent on the trunk, but reproduces on the 13
branch.
Jakub Jelinek [Fri, 5 Apr 2024 07:31:28 +0000 (09:31 +0200)]
c++: Fix ICE with weird copy assignment operator [PR114572]
While ctors/dtors don't return anything (undeclared void or this pointer
on arm) and copy assignment operators normally return a reference to *this,
it isn't invalid to return uselessly some class object which might need
destructing, but the OpenMP clause handling code wasn't expecting that.
The following patch fixes that.
2024-04-05 Jakub Jelinek <jakub@redhat.com>
PR c++/114572
* cp-gimplify.cc (cxx_omp_clause_apply_fn): Call build_cplus_new
on build_call_a result if it has class type.
Jakub Jelinek [Thu, 4 Apr 2024 08:47:52 +0000 (10:47 +0200)]
fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]
The following testcase is incorrectly rejected. The problem is that
for bit-fields native_encode_initializer expects the corresponding
CONSTRUCTOR elt value must be INTEGER_CST, but that isn't the case
here, it is wrapped into NON_LVALUE_EXPR by maybe_wrap_with_location.
We could STRIP_ANY_LOCATION_WRAPPER as well, but as all we are looking for
is INTEGER_CST inside, just looking through NON_LVALUE_EXPR seems easier.
2024-04-04 Jakub Jelinek <jakub@redhat.com>
PR c++/114537
* fold-const.cc (native_encode_initializer): Look through
NON_LVALUE_EXPR if val is INTEGER_CST.