Richard Biener [Thu, 13 Apr 2023 12:09:30 +0000 (14:09 +0200)]
tree-optimization/109491 - ICE in expressions_equal_p
At some point I elided the NULL pointer check in expressions_equal_p
because it shouldn't be necessary not realizing that for example
TARGET_MEM_REF has optional operands we cannot substitute with
something non-NULL with the same semantics. The following does the
simple thing and restore the check removed in r11-4982.
PR tree-optimization/109491
* tree-ssa-sccvn.cc (expressions_equal_p): Restore the
NULL operands test.
Richard Biener [Wed, 12 Apr 2023 08:22:08 +0000 (10:22 +0200)]
tree-optimization/109473 - ICE with reduction epilog adjustment op
The following makes sure to carry out the reduction epilog adjustment
in the original computation type which for pointers is an unsigned
integer type. There's a similar issue with signed vs. unsigned ops
and overflow which is fixed by this as well.
PR tree-optimization/109473
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Convert scalar result to the computation type before performing
the reduction adjustment.
Richard Biener [Thu, 23 Mar 2023 15:56:53 +0000 (16:56 +0100)]
lto/109263 - lto-wrapper and -g0 -ggdb
The following makes lto-wrapper deal with non-combined debug
disabling / enabling option combinations properly. Interestingly
-gno-dwarf also enables debug.
PR lto/109263
* lto-wrapper.cc (run_gcc): Parse alternate debug options
as well, they always enable debug.
Richard Biener [Tue, 21 Mar 2023 11:49:36 +0000 (12:49 +0100)]
tree-optimization/109219 - avoid looking at STMT_SLP_TYPE
The following avoids looking at STMT_SLP_TYPE apart from the only
place needing it - transform and analysis of non-SLP loop stmts.
In particular it doesn't have a reliable meaning on SLP representatives
which are also passed as stmt_vinfo to vectorizable_* routines. The
proper way to check in those is to look for the slp_node argument
instead.
PR tree-optimization/109219
* tree-vect-loop.cc (vectorizable_reduction): Check
slp_node, not STMT_SLP_TYPE.
* tree-vect-stmts.cc (vectorizable_condition): Likewise.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
Remove assertion on STMT_SLP_TYPE.
Richard Biener [Mon, 27 Mar 2023 14:40:15 +0000 (16:40 +0200)]
ipa/106124 - ICE with -fkeep-inline-functions and OpenMP
The testcases in this bug reveal cases where an early generated
type is collected because it was unused but gets attempted to
be recreated later when a late DIE for a function (an OpenMP
reduction) is created. That's unexpected and possibly the fault
of OpenMP but the following allows the re-creation of the context
type to succeed.
PR ipa/106124
* dwarf2out.cc (lookup_type_die): Reset TREE_ASM_WRITTEN
so we can re-create the DIE for the type if required.
Richard Biener [Wed, 7 Dec 2022 09:26:01 +0000 (10:26 +0100)]
ipa/105676 - pure attribute suggestion for const function
When a function is declared const (even though it technically
accesses memory), ipa-modref discovering pureness shouldn't end
up suggesting that attribute. The following thus exempts
'const' functions from ipa_make_function_pure handling.
PR ipa/105676
* ipa-pure-const.cc (ipa_make_function_pure): Skip also
for functions already being const.
Kewen Lin [Tue, 4 Apr 2023 02:47:44 +0000 (21:47 -0500)]
rs6000: Fix vector parity support [PR108699]
The failures on the original failed case builtin-bitops-1.c
and the associated test case pr108699.c here show that the
current support of parity vector mode is wrong on Power.
The hardware insns vprtyb[wdq] which operate on the least
significant bit of each byte per element, they doesn't match
what RTL opcode parity needs, but the current implementation
expands it with them wrongly.
This patch is to fix the handling with one more insn vpopcntb.
PR target/108699
gcc/ChangeLog:
* config/rs6000/altivec.md (*p9v_parity<mode>2): Rename to ...
(rs6000_vprtyb<mode>2): ... this.
* config/rs6000/rs6000-builtins.def (VPRTYBD): Replace parityv2di2 with
rs6000_vprtybv2di2.
(VPRTYBW): Replace parityv4si2 with rs6000_vprtybv4si2.
(VPRTYBQ): Replace parityv1ti2 with rs6000_vprtybv1ti2.
* config/rs6000/vector.md (parity<mode>2 with VEC_IP): Expand with
popcountv16qi2 and the corresponding rs6000_vprtyb<mode>2.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p9-vparity.c: Add scan-assembler-not for vpopcntb
to distinguish parity byte from parity.
* gcc.target/powerpc/pr108699.c: New test.
Kewen Lin [Tue, 4 Apr 2023 02:47:54 +0000 (21:47 -0500)]
rs6000: Fix vector_set_var_p9 by considering BE [PR108807]
As PR108807 exposes, the current handling in function
rs6000_expand_vector_set_var_p9 doesn't take care of big
endianness. Currently the function is to rotate the
target vector by moving element to-be-set to element 0,
set element 0 with the given val, then rotate back. To
get the permutation control vector for the rotation, it
makes use of lvsr and lvsl, but the element ordering is
different for BE and LE (like element 0 is the most
significant one on BE while the least significant one on
LE), this patch is to add consideration for BE and make
sure permutation control vectors for rotations are expected.
As tested, it helped to fix the below failures:
FAIL: gcc.target/powerpc/pr79251-run.p9.c execution test
FAIL: gcc.target/powerpc/pr89765-mc.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-10d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-11d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-14d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-16d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-18d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-9d.c execution test
PR target/108807
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_expand_vector_set_var_p9): Fix gen
function for permutation control vector by considering big endianness.
Harald Anlauf [Fri, 14 Apr 2023 18:45:19 +0000 (20:45 +0200)]
Fortran: fix compile-time simplification of SET_EXPONENT [PR109511]
gcc/fortran/ChangeLog:
PR fortran/109511
* simplify.cc (gfc_simplify_set_exponent): Fix implementation of
compile-time simplification of intrinsic SET_EXPONENT for argument
X < 1 and for I < 0.
gcc/testsuite/ChangeLog:
PR fortran/109511
* gfortran.dg/set_exponent_1.f90: New test.
Jan Hubicka [Fri, 14 Apr 2023 17:18:24 +0000 (19:18 +0200)]
Disable X86_TUNE_AVX256_MOVE_BY_PIECES and STORE_BY_PIECES for znver1-3
I have enabled SSE moves for znver1-3 since they are performance win on this
machine too (we avoid using loops or string operations which are more costy).
However as discussed in the PR log, this triggers bug in IRA and it was decided
it is better to not backport the fix.
Patrick Palka [Wed, 12 Apr 2023 16:56:47 +0000 (12:56 -0400)]
libstdc++: Ensure headers used by fast_float are included
This makes floating_from_chars.cc explicitly include all headers
that are used by the original fast_float amalgamation according to r12-6647-gf5c8b82512f9d3, except:
1. <cctype> since fast_float doesn't seem to use anything from it
2. <cinttypes> since fast_float doesn't seem to use anything directly
from it (this header also pulls in <cstdint>)
3. <system_error> since std::errc is naturally already available
from <charconv>
This avoids potential fast_float build failures on platforms for which
some required headers (in particular <cstdint>) end up not getting
transitively included from elsewhere.
libstdc++-v3/ChangeLog:
* src/c++17/floating_from_chars.cc: Include <algorithm>,
<iterator>, <limits> and <cstdint>.
Michael Meissner [Tue, 11 Apr 2023 02:46:34 +0000 (22:46 -0400)]
Backport from master
2023-04-10 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/109067
* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
(init_float128_ieee): Delete code to switch complex multiply and divide
for long double. Backport from master, 3/20/2023.
(complex_multiply_builtin_code): New helper function.
(complex_divide_builtin_code): Likewise.
(rs6000_mangle_decl_assembler_name): Add support for mangling the name
of complex 128-bit multiply and divide built-in functions.
gcc/testsuite/
PR target/109067
* gcc.target/powerpc/divic3-1.c: New test. Backport from master,
3/20/2023.
* gcc.target/powerpc/divic3-2.c: Likewise.
* gcc.target/powerpc/mulic3-1.c: Likewise.
* gcc.target/powerpc/mulic3-2.c: Likewise.
vect: Make partial trapping ops use predication [PR96373]
PR96373 points out that a predicated SVE loop currently converts
trapping unconditional ops into unpredicated vector ops. Doing
the operation on inactive lanes can then raise an exception.
As discussed in the PR trail, we aren't 100% consistent about
whether we preserve traps or not. But the direction of travel
is clearly to improve that rather than live with it. This patch
tries to do that for the SVE case.
Doing this regresses gcc.target/aarch64/sve/fabd_1.c. I've added
-fno-trapping-math for now and filed PR108571 to track it.
A similar problem applies to fsubr_1.c.
I think this is likely to regress Power 10, since conditional
operations are only available for masked loops. I think we'll
need to add -fno-trapping-math to any affected testcases,
but I don't have a Power 10 system to test on.
gcc/
PR tree-optimization/96373
PR tree-optimization/108979
* tree-vect-stmts.cc (vectorizable_operation): Predicate trapping
operations on the loop mask. Reject partial vectors if this isn't
possible. Don't mask operations on invariants.
aarch64: Restore vectorisation of vld1 inputs [PR109072]
Before GCC 12, we would vectorize:
int32_t arr[] = { x, x, x, x };
at -O3. Vectorizing the store on its own is often a loss, particularly
for integers, so g:4963079769c99c4073adfd799885410ad484cbbe suppressed it.
This was necessary to fix regressions from enabling vectorisation at -O2,
However, the vectorisation is important if the code subsequently loads
from the array using vld1:
return vld1q_s32 (arr);
This approach of initialising an array and loading from it is the
recommend endian-agnostic way of constructing an ACLE vector.
As discussed in the PR notes, the general fix would be to fold the
store and load-back to a constructor (preferably before vectorisation).
But that's clearly not stage 4 material.
This patch instead delays folding vld1 until after inlining and
records which decls a vld1 loads from. It then treats vector
stores to those decls as free, on the optimistic assumption that
they will be removed later. The patch also brute-forces
vectorization of plain constructor+store sequences, since some
of the CPU costs make that (dubiously) expensive even when the
store is discounted.
Delaying folding showed that we were failing to update the vops.
The patch fixes that too.
Thanks to Tamar for discussion & help with testing.
gcc/
PR target/109072
* config/aarch64/aarch64-protos.h (aarch64_vector_load_decl): Declare.
* config/aarch64/aarch64.h (machine_function::vector_load_decls): New
variable.
* config/aarch64/aarch64-builtins.cc (aarch64_record_vector_load_arg):
New function.
(aarch64_general_gimple_fold_builtin): Delay folding of vld1 until
after inlining. Record which decls are loaded from. Fix handling
of vops for loads and stores.
* config/aarch64/aarch64.cc (aarch64_vector_load_decl): New function.
(aarch64_accesses_vector_load_decl_p): Likewise.
(aarch64_vector_costs::m_stores_to_vector_load_decl): New member
variable.
(aarch64_vector_costs::add_stmt_cost): If the function has a vld1
that loads from a decl, treat vector stores to those decls as
zero cost.
(aarch64_vector_costs::finish_cost): ...and in that case,
if the vector code does nothing more than a store, give the
prologue a zero cost as well.
gcc/testsuite/
PR target/109072
* gcc.target/aarch64/pr109072_1.c: New test.
* gcc.target/aarch64/pr109072_2.c: Likewise.
lra: Replace subregs in bare uses & clobbers [PR108681]
In this PR we had a write to one vector of a 4-vector tuple.
The vector had mode V1DI, and the target doesn't provide V1DI
moves, so this was converted into:
(clobber (subreg:V1DI (reg/v:V4x1DI 92 [ b ]) 24))
followed by a DImode move. (The clobber isn't really necessary
or helpful for a single word, but would be for wider moves.)
The subreg in the clobber survived until after RA:
(clobber (subreg:V1DI (reg/v:V4x1DI 34 v2 [orig:92 b ] [92]) 24))
IMO this isn't well-formed. If a subreg of a hard register simplifies
to a hard register, it should be replaced by the hard register. If the
subreg doesn't simplify, then target-independent code can't be sure
which parts of the register are affected and which aren't. A clobber
of such a subreg isn't useful and (again IMO) should just be removed.
Conversely, a use of such a subreg is effectively a use of the whole
inner register.
LRA has code to simplify subregs of hard registers, but it didn't
handle bare uses and clobbers. The patch extends it to do that.
One question was whether the final_p argument to alter_subregs
should be true or false. True is IMO dangerous, since it forces
replacements that might not be valid from a dataflow perspective,
and uses and clobbers only exist for dataflow. As said above,
I think the correct way of handling a failed simplification would
be to delete clobbers and replace uses of subregs with uses of
the inner register. But I didn't want to write untested code
to do that.
In the PR, the clobber caused an infinite loop in DCE, because
of a disagreement about what effect the clobber had. But for
the reasons above, I think that was GIGO rather than a bug in
DF or DCE.
gcc/
PR rtl-optimization/108681
* lra-spills.cc (lra_final_code_change): Extend subreg replacement
code to handle bare uses and clobbers.
gcc/testsuite/
PR rtl-optimization/108681
* gcc.target/aarch64/pr108681.c: New test.
vect: Fix single def-use cycle for ifn reductions [PR108608]
The patch that added support for fmin/fmax reductions didn't
handle single def-use cycles. In some ways, this seems like
going out of our way to make things slower, but that's a
discussion for another day.
gcc/
PR tree-optimization/108608
* tree-vect-loop.cc (vect_transform_reduction): Handle single
def-use cycles that involve function calls rather than tree codes.
gcc/testsuite/
PR tree-optimization/108608
* gcc.dg/vect/pr108608.c: New test.
* gcc.target/aarch64/sve/pr108608-1.c: Likewise.
convert_memory_address_addr_space_1 has two modes: one in which it
tries to create a self-contained RTL expression (which might fail)
and one in which it can emit new instructions where necessary.
When handling a CONST, the function recurses into the CONST's
operand and then constifies the result. But that's only valid if
the result is still a self-contained expression. If new instructions
have been emitted, the expression will refer to the (non-constant)
results of those instructions.
In the PR, this caused us to emit a nonsensical (const (reg ...))
REG_EQUAL note.
gcc/
PR tree-optimization/108603
* explow.cc (convert_memory_address_addr_space_1): Only wrap
the result of a recursive call in a CONST if no instructions
were emitted.
gcc/testsuite/
PR tree-optimization/108603
* gcc.target/aarch64/sve/pr108603.c: New test.
rtl-ssa: Fix splitting of clobber groups [PR108508]
Since rtl-ssa isn't a real/native SSA representation, it has
to honour the constraints of the underlying rtl representation.
Part of this involves maintaining an rpo list of definitions
for each rtl register, backed by a splay tree where necessary
for quick lookup/insertion.
However, clobbers of a register don't act as barriers to
other clobbers of a register. E.g. it's possible to move one
flag-clobbering instruction across an arbitrary number of other
flag-clobbering instructions. In order to allow passes to do
that without quadratic complexity, the splay tree groups all
consecutive clobbers into groups, with only the group being
entered into the splay tree. These groups in turn have an
internal splay tree of clobbers where necessary.
This means that, if we insert a new definition and use into
the middle of a sea of clobbers, we need to split the clobber
group into two groups. This was quite a difficult condition
to trigger during development, and the PR shows that the code
to handle it had (at least) two bugs.
First, the process involves searching the clobber tree for
the split point. This search can give either the previous
clobber (which will belong to the first of the split groups)
or the next clobber (which will belong to the second of the
split groups). The code for the former case handled the
split correctly but the code for the latter case didn't.
Second, I'd forgotten to add the second clobber group to the
main splay tree. :-(
gcc/
PR rtl-optimization/108508
* rtl-ssa/accesses.cc (function_info::split_clobber_group): When
the splay tree search gives the first clobber in the second group,
make sure that the root of the first clobber group is updated
correctly. Enter the new clobber group into the definition splay
tree.
gcc/testsuite/
PR rtl-optimization/108508
* gcc.target/aarch64/pr108508.c: New test.
vectorizable_condition checks whether a COND_EXPR condition is used
elsewhere with a loop mask. If so, it applies the loop mask to the
COND_EXPR too, to reduce the number of live masks and to increase the
chance of combining the AND with the comparison.
There is also code to do this for inverted conditions. E.g. if
we have a < b ? c : d and something else is conditional on !(a < b)
(such as a load in d), we use !(a < b) ? d : c and apply the loop
mask to !(a < b).
This inversion relied on the function's bitop1/bitop2 mechanism.
However, that mechanism is skipped if the condition is split out of
the COND_EXPR as a separate statement. This meant that we could end
up using the inverse of the intended condition.
There is a separate way of negating the condition when a mask
is being applied (which is also used for EXTRACT_LAST reductions).
This patch uses that instead.
As well as the testcase, this fixes aarch64/sve/vcond_{4,17}_run.c.
rtl-ssa: Extend m_num_defs to a full unsigned int [PR108086]
insn_info tried to save space by storing the number of
definitions in a 16-bit bitfield. The justification was:
// ... FIRST_PSEUDO_REGISTER + 1
// is the maximum number of accesses to hard registers and memory, and
// MAX_RECOG_OPERANDS is the maximum number of pseudos that can be
// defined by an instruction, so the number of definitions should fit
// easily in 16 bits.
But while that reasoning holds (I think) for real instructions,
it doesn't hold for artificial instructions. I don't think there's
any sensible higher limit we can use, so this patch goes for a full
unsigned int.
gcc/
PR rtl-optimization/108086
* rtl-ssa/insns.h (insn_info): Make m_num_defs a full unsigned int.
Adjust size-related commentary accordingly.
This is the 2nd attempt to fix PR90706. IRA calculates wrong AVR
costs for moving general hard regs of SFmode. This was the reason for
spilling a pseudo in the PR. In this patch we use smaller move cost
of hard reg in its natural and operand modes.
PR rtl-optimization/90706
gcc/ChangeLog:
* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
(record_operand_costs): Find and use smaller cost for hard reg
move.
David Malcolm [Wed, 29 Mar 2023 18:16:49 +0000 (14:16 -0400)]
analyzer: fix ICE on certain longjmp calls [PR109094]
PR analyzer/109094 reports an ICE in the analyzer seen on qemu's
target/i386/tcg/translate.c
The issue turned out to be that when handling a longjmp, the code
to pop the frames was generating an svalue for the result_decl of any
popped frame that had a non-void return type (and discarding it) leading
to "uninit" poisoned_svalue_diagnostic instances being saved since the
result_decl is only set by the greturn stmt. Later, when checking the
feasibility of the path to these diagnostics, m_check_expr was evaluated
in the context of the frame of the longjmp, leading to an attempt to
evaluate the result_decl of each intervening frames whilst in the
context of the topmost frame, leading to an assertion failure in
frame_region::get_region_for_local here:
This patch updates the analyzer's longjmp implementation so that it
doesn't attempt to generate svalues for the result_decls when popping
frames, fixing the assertion failure (and presumably fixing "uninit"
false positives in a release build).
gcc/analyzer/ChangeLog:
PR analyzer/109094
* region-model.cc (region_model::on_longjmp): Pass false for
new "eval_return_svalue" param of pop_frame.
(region_model::pop_frame): Add new "eval_return_svalue" param and
use it to suppress the call to get_rvalue on the result when
needed by on_longjmp.
* region-model.h (region_model::pop_frame): Add new
"eval_return_svalue" param.
gcc/testsuite/ChangeLog:
PR analyzer/109094
* gcc.dg/analyzer/setjmp-pr109094.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
PR analyzer/108968
* region-model.cc (region_model::get_rvalue_1): Handle VAR_DECLs
with a DECL_HARD_REGISTER by returning UNKNOWN.
gcc/testsuite/ChangeLog:
PR analyzer/108968
* gcc.dg/analyzer/uninit-pr108968-register.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 29 Mar 2023 18:16:49 +0000 (14:16 -0400)]
analyzer: fix further overzealous state purging [PR108733]
PR analyzer/108733 reports various false positives in qemu from
-Wanalyzer-use-of-uninitialized-value with __attribute__((cleanup))
at -O1 and above.
Root cause is that the state-purging code was failing to treat:
_25 = MEM[(void * *)&val];
as a usage of "val", leading to it erroneously purging the
initialization of "val" along an execution path that didn't otherwise
use "val", apart from the __attribute__((cleanup)).
Fixed thusly.
Integration testing on the patch show this change in the number of
diagnostics:
-Wanalyzer-use-of-uninitialized-value
coreutils-9.1: 18 -> 16 (-2)
qemu-7.2.0: 87 -> 80 (-7)
where all that I investigated appear to have been false positives, hence
an improvement.
David Malcolm [Wed, 29 Mar 2023 18:16:48 +0000 (14:16 -0400)]
analyzer: fix overzealous state purging with on-stack structs [PR108704]
PR analyzer/108704 reports many false positives seen from
-Wanalyzer-use-of-uninitialized-value on qemu's softfloat.c on code like
the following:
struct st s;
s = foo ();
s = bar (s); // bogusly reports that s is uninitialized here
where e.g. "struct st" is "floatx80" in the qemu examples.
The root cause is overzealous purging of on-stack structs in the code I
added in r12-7718-gfaacafd2306ad7, where at:
s = bar (s);
state_purge_per_decl::process_point_backwards "sees" the assignment to 's'
and stops processing, effectively treating 's' as unneeded before this
stmt, not noticing the use of 's' in the argument.
Fixed thusly.
The patch greatly reduces the number of
-Wanalyzer-use-of-uninitialized-value warnings from my integration tests:
ImageMagick-7.1.0-57: 10 -> 6 (-4)
qemu-7.2: 858 -> 87 (-771)
haproxy-2.7.1: 1 -> 0 (-1)
All of the above that I've examined appear to be false positives.
gcc/analyzer/ChangeLog:
PR analyzer/108704
* state-purge.cc (state_purge_per_decl::process_point_backwards):
Don't stop processing the decl if it's fully overwritten by
this stmt if it's also used by this stmt.
gcc/testsuite/ChangeLog:
PR analyzer/108704
* gcc.dg/analyzer/uninit-7.c: New test.
* gcc.dg/analyzer/uninit-pr108704.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
The analyzer "sees" the comparison against NULL in foo_b, and splits the
analysis into the NULL and not-NULL cases; later, back in foo_a, at
switch (p->type)
it complains that p is NULL.
Previously we were only using __attribute__((nonnull)) as something to
complain about when it was violated; we weren't using it as a source of
knowledge.
This patch fixes things by making the analyzer respect
__attribute__((nonnull)) at the top-level of the analysis: any such
params are now assumed to be non-NULL, so that the analyzer assumes the
g_return_if_fail inside foo_b doesn't fail when called from foo_a
gcc/analyzer/ChangeLog:
PR analyzer/106325
* region-model-manager.cc
(region_model_manager::get_or_create_null_ptr): New.
* region-model.cc (region_model::on_top_level_param): Add
"nonnull" param and make use of it.
(region_model::push_frame): When handling a top-level entrypoint
to the analysis, determine which params __attribute__((nonnull))
applies to, and pass to on_top_level_param.
* region-model.h (region_model_manager::get_or_create_null_ptr):
New decl.
(region_model::on_top_level_param): Add "nonnull" param.
gcc/analyzer/ChangeLog:
PR analyzer/105784
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): For POINTER_PLUS_EXPR,
PLUS_EXPR and MINUS_EXPR, eliminate requirement that the final
type matches that of arg0 in favor of a cast.
gcc/testsuite/ChangeLog:
PR analyzer/105784
* gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 29 Mar 2023 18:16:47 +0000 (14:16 -0400)]
analyzer: fix feasibility false +ve on jumps through function ptrs [PR107582]
PR analyzer/107582 reports a false +ve from
-Wanalyzer-use-of-uninitialized-value where
the analyzer's feasibility checker erroneously decides
that point (B) in the code below is reachable, with "x" being
uninitialized there:
pthread_cleanup_push(func, NULL);
while (ret != ETIMEDOUT)
ret = rand() % 1000;
/* (A): after the while loop */
if (ret != ETIMEDOUT)
x = &z;
pthread_cleanup_pop(1);
if (ret == ETIMEDOUT)
return 0;
/* (B): after not bailing out */
due to these contradictionary conditions somehow both holding:
* (ret == ETIMEDOUT), at (A) (skipping the initialization of x), and
* (ret != ETIMEDOUT), at (B)
The root cause is that after the while loop, state merger puts ret in
the exploded graph in an UNKNOWN state, and saves the diagnostic at (B).
Later, as we explore the feasibilty of reaching the enode for (B),
dynamic_call_info_t::update_model is called to push/pop the
frames for handling the call to "func" in pthread_cleanup_pop.
The "ret" at these nodes in the feasible_graph has a conjured_svalue for
"ret", and a constraint on it being either == *or* != ETIMEDOUT.
However dynamic_call_info_t::update_model blithely clobbers the
model with a copy from the exploded_graph, in which "ret" is UNKNOWN.
This patch fixes dynamic_call_info_t::update_model so that it
simulates pushing/popping a frame on the model we're working with,
preserving knowledge of the constraint on "ret", and enabling the
analyzer to "know" that the bail-out must happen.
gcc/analyzer/ChangeLog:
PR analyzer/107582
* engine.cc (dynamic_call_info_t::update_model): Update the model
by pushing or pop a frame, rather than by clobbering it with the
model from the exploded_node's state.
gcc/testsuite/ChangeLog:
PR analyzer/107582
* gcc.dg/analyzer/feasibility-4.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-1.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-2.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
PR analyzer/107345
* region-model.cc (region_model::eval_condition_without_cm):
Ensure that constants are on the right-hand side before checking
for them.
gcc/testsuite/ChangeLog:
PR analyzer/107345
* gcc.dg/analyzer/pr107345.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Use check_call_args
when ensuring that we call get_arg_svalue on all args. Remove
redundant call from handling for stdio builtins.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 29 Mar 2023 18:16:46 +0000 (14:16 -0400)]
analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on special-cased functions [PR106573]
We were missing checks for uninitialized params on calls to functions
that the analyzer has hardcoded knowledge of - both for those that are
handled just by state machines, and for those that are handled in
region-model-impl-calls.cc (for those arguments for which the svalue
wasn't accessed in handling the call).
The standard's move-and-swap implementation generates smaller code at
all levels except -O0 and -Og, so it seems simplest to just do what the
standard says.
libstdc++-v3/ChangeLog:
PR libstdc++/108118
* include/bits/shared_ptr_base.h (weak_ptr::operator=):
Implement as move-and-swap exactly as specified in the standard.
* testsuite/20_util/weak_ptr/cons/self_move.cc: New test.
Björn Schäpers [Tue, 13 Dec 2022 21:02:47 +0000 (22:02 +0100)]
libstdc++: Deliver names of C functions in <stacktrace>
__cxa_demangle is only to demangle C++ names, for all C functions,
extern "C" functions, and including main it returns -2, in that case
just adapt the given name. Otherwise it's kept empty, which doesn't look
nice in the stacktrace.
libstdc++-v3/ChangeLog:
* include/std/stacktrace (stacktrace_entry::_S_demangle): Use
raw __name if __cxa_demangle could not demangle it.
Jonathan Wakely [Fri, 2 Dec 2022 16:18:43 +0000 (16:18 +0000)]
libstdc++: Change class-key for duration and time_point to class
We define these with the 'struct' keyword, but the standard uses
'class'. This results in warnings if users try to refer to them using
elaborated type specifiers.
libstdc++-v3/ChangeLog:
* include/bits/chrono.h (duration, time_point): Change 'struct'
to 'class'.
Jonathan Wakely [Thu, 26 Jan 2023 10:55:28 +0000 (10:55 +0000)]
libstdc++: Add returns_nonnull to non-inline std::map detail [PR108554]
std::map uses a non-inline function to rebalance its tree and the
compiler can't see that it always returns a valid pointer (assuming
valid inputs, which is a precondition anyway). This can result in
-Wnull-derefernce warnings for valid code, because the compiler thinks
there is a path where the function returns null.
Adding the returns_nonnull attribute tells the compiler that is can't
happen. While we're doing that, we might as well also add a nonnull
attribute to the rebalancing functions too.
libstdc++-v3/ChangeLog:
PR libstdc++/108554
* include/bits/stl_tree.h (_Rb_tree_insert_and_rebalance): Add
nonnull attribute.
(_Rb_tree_rebalance_for_erase): Add nonnull and returns_nonnull
attributes.
* testsuite/23_containers/map/modifiers/108554.cc: New test.
Jonathan Wakely [Thu, 22 Sep 2022 17:36:04 +0000 (18:36 +0100)]
libstdc++: Optimize std::bitset<N>::to_string
This makes to_string approximately twice as fast at any optimization
level. Instead of iterating through every bit, jump straight to the next
bit that is set, by using _Find_first and _Find_next.
libstdc++-v3/ChangeLog:
* include/std/bitset (bitset::_M_copy_to_string): Find set bits
instead of iterating over individual bits.
Jonathan Wakely [Thu, 15 Sep 2022 15:57:30 +0000 (16:57 +0100)]
libstdc++: Tweak TSan annotations for std::atomic<shared_ptr<T>>
Do not use the __tsan_mutex_not_static flag for annotation functions
where it's not a valid flag. Also use the try_lock and try_lock_failed
flags to more precisely annotate the CAS loop used to acquire a lock.
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_atomic.h (_GLIBCXX_TSAN_MUTEX_PRE_LOCK):
Replace with ...
(_GLIBCXX_TSAN_MUTEX_TRY_LOCK): ... this, add try_lock flag.
(_GLIBCXX_TSAN_MUTEX_TRY_LOCK_FAILED): New macro using
try_lock_failed flag
(_GLIBCXX_TSAN_MUTEX_POST_LOCK): Rename to ...
(_GLIBCXX_TSAN_MUTEX_LOCKED): ... this.
(_GLIBCXX_TSAN_MUTEX_PRE_UNLOCK): Remove invalid flag.
(_GLIBCXX_TSAN_MUTEX_POST_UNLOCK): Remove invalid flag.
(_Sp_atomic::_Atomic_count::lock): Use new macros.
Jonathan Wakely [Wed, 14 Sep 2022 18:11:22 +0000 (19:11 +0100)]
libstdc++: Add TSan annotations to std::atomic<shared_ptr<T>>
This adds annotations to std::atomic<shared_ptr<T>> to enable TSan to
understand the custom locking. Without this, TSan reports data races for
accesses to the _M_ptr member, even though those are correctly
synchronized using atomic operations on the tagged pointer.
Jonathan Wakely [Tue, 17 May 2022 14:14:39 +0000 (15:14 +0100)]
libstdc++: Add attributes to functions in <memory_resource>
Add attributes to the accessors for the global memory resource objects,
to allow the compiler to eliminate redundant calls to them. For example,
multiple calls to std::pmr::new_delete_resource() will always return the
same object, and so the compiler can replace them with a single call.
Ideally we would like adjacent calls to std::pmr::get_default_resource()
to be combined into a single call by the CSE pass. The 'pure' attribute
would permit that. However, the standard requires that calls to
std::pmr::set_default_resource() synchronize with subsequent calls to
std::pmr::get_default_resource(). With 'pure' the DCE pass might
eliminate seemingly redundant calls to std::pmr::get_default_resource().
That might be unsafe, because the caller might be relying on the
associated synchronization. We could use a hypothetical attribute that
allows CSE but not DCE, but we don't have one. So it can't be 'pure'.
Jonathan Wakely [Wed, 14 Sep 2022 13:03:19 +0000 (14:03 +0100)]
libstdc++: Add assertion to std::promise::set_exception (LWG 2276)
Without this assertion, the shared state is made ready, but contains
neither a value nor an exception. Add an assertion to prevent users from
accessing a value that was never initialized in the shared state.
libstdc++-v3/ChangeLog:
* include/std/future
(_State_baseV2::__setter(exception_ptr&, promise&)): Add
assertion for LWG 2276 precondition.
* testsuite/30_threads/promise/members/set_exception_neg.cc:
New test.
Jonathan Wakely [Wed, 7 Sep 2022 19:17:04 +0000 (20:17 +0100)]
libstdc++: Find make_error_code and make_error_condition via ADL only
The new proposed resolution for LWG 3629 says that std::error_code and
std::error_condition should only use ADL to find their customization
points. This means we need to use a poison pill to prevent lookup from
finding overloads in the enclosing namespaces.
We can also remove the forward declarations of std::make_error_code and
std::make_error_condition, because they aren't needed now. ADL can find
them anyway (when std is an associated namespace), and unqualified name
lookup will not (and should not) find them.
libstdc++-v3/ChangeLog:
* include/std/system_error (__adl_only::make_error_code): Add
deleted function.
(__adl_only::make_error_condition): Likewise.
(error_code::error_code(ErrorCodeEnum)): Add using-declaration
for deleted function.
(error_condition::error_condition(ErrorConditionEnum)):
Likewise.
* testsuite/19_diagnostics/error_code/cons/lwg3629.cc: New test.
* testsuite/19_diagnostics/error_condition/cons/lwg3629.cc: New test.
Jonathan Wakely [Tue, 17 May 2022 13:50:32 +0000 (14:50 +0100)]
libstdc++: Add attributes to <system_error> and related
Add the const attribute to std::future_category() and
std::iostream_category(), to match the existing attributes on
std::generic_category() and std::system_category().
Also add [[nodiscard]] to those functions and to the comparison
operators for std::error_code and std::error_condition, and to
std::make_error_code and std::make_error_condition overloads.
Jonathan Wakely [Fri, 25 Nov 2022 11:40:37 +0000 (11:40 +0000)]
libstdc++: Fix orphaned/nested output of configure checks
This moves two AC_MSG_RESULT lines for <uchar.h> features so that they
are only printed when the corresponding AC_MSG_CHECKING actually
happened. This fixes configure output like:
checking for uchar.h... no
no
checking for int64_t... yes
Also move the AC_MSG_CHECKING for libbacktrace support so it doesn't
come after AC_CHECK_HEADERS output. This fixes:
checking whether to build libbacktrace support... checking for sys/mman.h... (cached) yes
yes
libstdc++-v3/ChangeLog:
* acinclude.m4 (GLIBCXX_CHECK_UCHAR_H): Don't use AC_MSG_RESULT
unless the AC_MSG_CHECKING happened.
* configure: Regenerate.
Jonathan Wakely [Tue, 28 Mar 2023 10:12:58 +0000 (11:12 +0100)]
libstdc++: More fixes for null pointers used with std::char_traits
The std::char_traits member functions require that [p,p+n) is a valid
range, which is true for p==nullptr iff n==0. But we must not call
memcpy, memset etc, in that case, as they require non-null pointers even
when n==0.
This std::char_traits<char> and std::char_traits<wchar_t> explicit
specializations are already correct, but the primary template has some
bugs.
libstdc++-v3/ChangeLog:
* include/bits/char_traits.h (char_traits::copy): Return without
using memcpy if n==0.
(char_traits::assign): Likewise for memset.
Jonathan Wakely [Tue, 28 Mar 2023 09:50:40 +0000 (10:50 +0100)]
libstdc++: Tell GCC what basic_string::_M_is_local() means [PR109299]
This avoids a bogus warning about overflowing a buffer, because GCC
can't tell that we don't copy into the buffer unless it fits. By adding
a __builtin_unreachable() hint we inform the compiler about the
invariant that the buffer is only used when it's big enough.
This can also improve codegen, by eliminating dead code that GCC
couldn't tell was unreachable.
libstdc++-v3/ChangeLog:
PR libstdc++/109299
* include/bits/basic_string.h (basic_string::_M_is_local()): Add
hint for compiler that local strings fit in the local buffer.
Jonathan Wakely [Mon, 16 Jan 2023 10:15:41 +0000 (10:15 +0000)]
libstdc++: Fix copyright notice to use usual form [PR108413]
libstdc++-v3/ChangeLog:
PR libstdc++/108413
* include/c_compatibility/stdatomic.h: Change copyright line to
be consistent with other headers contributed under DCO terms.
* include/std/expected: Add full stop to copyright line.
Xi Ruoyao [Mon, 27 Mar 2023 17:48:02 +0000 (01:48 +0800)]
fixincludes: Declare memmem if it's not declared in system headers [PR109293]
memmem is not POSIX so the system may lack it. Then libiberty will
provide an implementation, but it's a "supplemental function" and not
declared in libiberty.h. We need to declare the prototype to use it
then.
See libiberty doc at
https://gcc.gnu.org/onlinedocs/libiberty/Supplemental-Functions.html.
Tested by bootstrapping GCC in the following container environments on
x86_64-linux-gnu:
1. "Vanilla" system with memmem in Glibc.
2. memmem removed from string.h.
3. memmem removed from both string.h and libc.so.
For 3, also verified that memmem from libiberty is linked into fixincl
executable.
Note that the backport does not contain a complete regeneration of
configure and config.h.in (attempting such regeneration resulted in all
the USED_FOR_TARGET conditional disappearing; this already happened in
trunk at r13-2200).
Eric Botcazou [Tue, 28 Mar 2023 08:13:24 +0000 (10:13 +0200)]
Fix PR target/109140
This is a regression present on the mainline and 12 branch at -O2, but the
issue is related to vectorization so was present at -O3 in earlier versions.
The vcondu expander that was added for VIS 3 more than a decade ago does not
fully work, because it does not filter out the unsigned condition codes (the
instruction is an UNSPEC that accepts only signed condition codes).
While I was at it, I also added the missing vcond and vcondu expanders for
the new comparison instructions that were added in VIS 4.
gcc/
PR target/109140
* config/sparc/sparc.cc (sparc_expand_vcond): Call signed_condition
on operand #3 to get the final condition code. Use std::swap.
* config/sparc/sparc.md (vcondv8qiv8qi): New VIS 4 expander.
(fucmp<gcond:code>8<P:mode>_vis): Move around.
(fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis): Likewise.
(vcondu<GCM:mode><GCM:mode>): New VIS 4 expander.
Harald Anlauf [Thu, 2 Mar 2023 21:37:14 +0000 (22:37 +0100)]
Fortran: fix CLASS attribute handling [PR106856]
gcc/fortran/ChangeLog:
PR fortran/106856
* class.cc (gfc_build_class_symbol): Handle update of attributes of
existing class container.
(gfc_find_derived_vtab): Fix several memory leaks.
(find_intrinsic_vtab): Ditto.
* decl.cc (attr_decl1): Manage update of symbol attributes from
CLASS attributes.
* primary.cc (gfc_variable_attr): OPTIONAL shall not be taken or
updated from the class container.
* symbol.cc (free_old_symbol): Adjust management of symbol versions
to not prematurely free array specs while working on the declation
of CLASS variables.
gcc/testsuite/ChangeLog:
PR fortran/106856
* gfortran.dg/interface_41.f90: Remove dg-pattern from valid testcase.
* gfortran.dg/class_74.f90: New test.
* gfortran.dg/class_75.f90: New test.
Jerry DeLisle [Mon, 27 Mar 2023 01:44:35 +0000 (18:44 -0700)]
Fortran: Modify checks to avoid referencing NULL pointer.
Backport from mainline.
gcc/fortran/ChangeLog:
PR fortran/102331
* decl.cc (attr_decl1): Guard against NULL pointer.
* parse.cc (match_deferred_characteristics): Include BT_CLASS in check
for derived being undefined.