Richard Biener [Thu, 10 Apr 2025 11:30:42 +0000 (13:30 +0200)]
middle-end/119706 - allow POLY_INT_CST as is_gimple_mem_ref_addr
We currently only INTEGER_CST, but not POLY_INT_CST, which leads
to the situation that when the POLY_INT_CST is only indrectly
present via a SSA def the IL is valid but when propagated it's not.
That's unsustainable.
PR middle-end/119706
* gimple-expr.cc (is_gimple_mem_ref_addr): Also allow
POLY_INT_CST.
Richard Biener [Thu, 6 Mar 2025 08:08:07 +0000 (09:08 +0100)]
middle-end/119119 - re-gimplification of empty CTOR assignments
The following testcase runs into a re-gimplification issue during
inlining when processing
MEM[(struct e *)this_2(D)].a = {};
where re-gimplification does not handle assignments in the same
way than the gimplifier but instead relies on rhs_predicate_for
and gimplifying the RHS standalone. This fails to handle
special-casing of CTORs. The is_gimple_mem_rhs_or_call predicate
already handles clobbers but not empty CTORs so we end up in
the fallback code trying to force the CTOR into a separate stmt
using a temporary - but as we have a non-copyable type here that ICEs.
The following generalizes empty CTORs in is_gimple_mem_rhs_or_call
since those need no additional re-gimplification.
PR middle-end/119119
* gimplify.cc (is_gimple_mem_rhs_or_call): All empty CTORs
are OK when not a register type.
We are detecting a cycle as double reduction where the inner loop
cycle has extra out-of-loop uses. This clashes at least with
assumptions from the SLP discovery code which says the cycle
isn't reachable from another SLP instance. It also was not intended
to support this case, in fact with GCC 14 we seem to generate wrong
code here.
PR tree-optimization/119057
* tree-vect-loop.cc (check_reduction_path): Add argument
specifying whether we're analyzing the inner loop of a
double reduction. Do not allow extra uses outside of the
double reduction cycle in this case.
(vect_is_simple_reduction): Adjust.
Richard Biener [Tue, 28 Jan 2025 11:28:14 +0000 (12:28 +0100)]
tree-optimization/117424 - invalid LIM of trapping ref
The following addresses a bug in tree_could_trap_p leading to
hoisting of a possibly trapping, because of out-of-bound, access.
We only ensured the first accessed byte is within a decl there,
the patch makes sure the whole base of the reference is within it.
This is pessimistic if a handled component would then subset to
a sub-object within the decl but upcasting of a decl to larger
types should be uncommon, questionable, and wrong without
-fno-strict-aliasing.
The testcase is a bit fragile, but I could not devise a (portable)
way to ensure an out-of-bound access to a decl would fault.
PR tree-optimization/117424
* tree-eh.cc (tree_could_trap_p): Verify the base is
fully contained within a decl.
Richard Biener [Mon, 3 Feb 2025 14:12:52 +0000 (15:12 +0100)]
tree-optimization/117113 - ICE with unroll-and-jam
When there's an inner loop without virtual header PHI but the outer
loop has one the fusion process cannot handle the need to create
an inner loop virtual header PHI. Punt in this case.
PR tree-optimization/117113
* gimple-loop-jam.cc (unroll_jam_possible_p): Detect when
we cannot handle virtual SSA update.
Richard Biener [Thu, 6 Mar 2025 12:48:16 +0000 (13:48 +0100)]
lto/114501 - missed free-lang-data for CONSTRUCTOR index
The following makes sure to also walk CONSTRUCTOR element indexes
which can be FIELD_DECLs, referencing otherwise unused types we
need to clean. walk_tree only walks CONSTRUCTOR element data.
PR lto/114501
* ipa-free-lang-data.cc (find_decls_types_r): Explicitly
handle CONSTRUCTORs as walk_tree handling of those is
incomplete.
Richard Biener [Mon, 3 Feb 2025 13:27:01 +0000 (14:27 +0100)]
lto/113207 - fix free_lang_data_in_type
When we process function types we strip volatile and const qualifiers
after building a simplified type variant (which preserves those).
The qualified type handling of both isn't really compatible, so avoid
bad interaction by swapping this, first dropping const/volatile
qualifiers and then building the simplified type thereof.
PR lto/113207
* ipa-free-lang-data.cc (free_lang_data_in_type): First drop
const/volatile qualifiers from function argument types,
then build a simplified type.
Richard Biener [Thu, 23 Jan 2025 12:10:17 +0000 (13:10 +0100)]
tree-optimization/112859 - bogus loop distribution
When we get a zero distance vector we still have to check for the
situation of a common inner loop with zero distance. But we can
still allow a zero distance for the loop we distribute
(gcc.dg/tree-ssa/ldist-33.c is such a case). This is because
zero distances in non-outermost loops are a misrepresentation
of dependence by dependence analysis.
Note that test coverage of loop distribution of loop nests is
very low.
PR tree-optimization/112859
PR tree-optimization/115347
* tree-loop-distribution.cc
(loop_distribution::pg_add_dependence_edges): For a zero
distance vector still make sure to not have an inner
loop with zero distance.
* gcc.dg/torture/pr112859.c: New testcase.
* gcc.dg/torture/pr115347.c: Likewise.
* gcc.dg/tree-ssa/ldist-36.c: Adjust.
Richard Biener [Fri, 28 Feb 2025 10:44:26 +0000 (11:44 +0100)]
ipa/111245 - bogus modref analysis for store in call that might throw
We currently record a kill for
*x_4(D) = always_throws ();
because we consider the store always executing since the appropriate
check for whether the stmt could throw is guarded by
!cfun->can_throw_non_call_exceptions.
PR ipa/111245
* ipa-modref.cc (modref_access_analysis::analyze_store): Do
not guard the check of whether the stmt could throw by
cfun->can_throw_non_call_exceptions.
Richard Biener [Fri, 17 May 2024 09:02:29 +0000 (11:02 +0200)]
middle-end/115110 - Fix view_converted_memref_p
view_converted_memref_p was checking the reference type against the
pointer type of the offset operand rather than its pointed-to type
which leads to all refs being subject to view-convert treatment
in get_alias_set causing numerous testsuite fails but with its
new uses from r15-512-g9b7cad5884f21c is also a wrong-code issue.
Richard Biener [Wed, 31 Jul 2024 08:07:45 +0000 (10:07 +0200)]
middle-end/101478 - ICE with degenerate address during gimplification
When we gimplify &MEM[0B + 4] we are re-folding the address in case
types are not canonical which ends up with a constant address that
recompute_tree_invariant_for_addr_expr ICEs on. Properly guard
that call.
PR middle-end/101478
* gimplify.cc (gimplify_addr_expr): Check we still have an
ADDR_EXPR before calling recompute_tree_invariant_for_addr_expr.
Richard Biener [Fri, 28 Feb 2025 13:09:29 +0000 (14:09 +0100)]
lto/91299 - weak definition inlined with LTO
The following fixes a thinko in the handling of interposed weak
definitions which confused the interposition check in
get_availability by setting DECL_EXTERNAL too early.
PR lto/91299
gcc/lto/
* lto-symtab.cc (lto_symtab_merge_symbols): Set DECL_EXTERNAL
only after calling get_availability.
gcc/testsuite/
* gcc.dg/lto/pr91299_0.c: New testcase.
* gcc.dg/lto/pr91299_1.c: Likewise.
Richard Biener [Fri, 28 Feb 2025 09:36:11 +0000 (10:36 +0100)]
tree-optimization/87984 - hard register assignments not preserved
The following disables redundant store elimination to hard register
variables which isn't valid.
PR tree-optimization/87984
* tree-ssa-dom.cc (dom_opt_dom_walker::optimize_stmt): Do
not perform redundant store elimination to hard register
variables.
* tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt):
Likewise.
Richard Biener [Thu, 24 Aug 2023 09:10:43 +0000 (11:10 +0200)]
tree-optimization/111125 - avoid BB vectorization in novector loops
When a loop is marked with
#pragma GCC novector
the following makes sure to also skip BB vectorization for contained
blocks. That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64
because of extra BB vectorization therein. I'm not specifically
dealing with sub-loops of novector loops, the desired semantics
isn't documented.
PR tree-optimization/111125
* tree-vect-slp.cc (vect_slp_function): Split at novector
loop entry, do not push blocks in novector loops.
Patrick Palka [Thu, 9 Jan 2025 15:49:45 +0000 (10:49 -0500)]
c++: template-id dependence wrt local static arg [PR117792]
Here we end up ICEing at instantiation time for the call to
f<local_static> ultimately because we wrongly consider the call to be
non-dependent, and so we specialize f ahead of time and then get
confused when fully substituting this specialization.
The call is dependent due to [temp.dep.temp]/3 and we miss that because
function template-id arguments aren't coerced until overload resolution,
and so the local static template argument lacks an implicit cast to
reference type that value_dependent_expression_p looks for before
considering dependence of the address. Other kinds of template-ids aren't
affected since they're coerced ahead of time.
So when considering dependence of a function template-id, we need to
conservatively consider dependence of the address of each argument (if
applicable).
PR c++/117792
gcc/cp/ChangeLog:
* pt.cc (type_dependent_expression_p): Consider the dependence
of the address of each template argument of a function
template-id.
Patrick Palka [Thu, 15 Aug 2024 14:23:54 +0000 (10:23 -0400)]
c++: c->B::m access resolved through current inst [PR116320]
Here when checking the access of (the injected-class-name) B in c->B::m
at parse time, we notice its context B (now the type) is a base of the
object type C<T>, so we proceed to use C<T> as the effective qualifying
type. But this C<T> is the dependent specialization not the primary
template type, so it has empty TYPE_BINFO, which leads to a segfault later
from perform_or_defer_access_check.
The reason the DERIVED_FROM_P (B, C<T>) test guarding this code path works
despite C<T> having empty TYPE_BINFO is because of its currently_open_class
logic (added in r9-713-gd9338471b91bbe) which replaces a dependent
specialization with the primary template type if we're inside it. So the
safest fix seems to be to call currently_open_class in the caller as well.
PR c++/116320
gcc/cp/ChangeLog:
* semantics.cc (check_accessibility_of_qualified_id): Try
currently_open_class when using the object type as the
effective qualifying type.
Patrick Palka [Thu, 11 Apr 2024 14:16:41 +0000 (10:16 -0400)]
c++: build_extra_args recapturing local specs [PR114303]
r13-6452-g341e6cd8d603a3 made build_extra_args walk evaluated contexts
first so that we prefer processing a local specialization in an evaluated
context even if its first use is in an unevaluated context. But this
means we need to avoid walking a tree that already has extra args/specs
saved because the list of saved specs appears to be an evaluated
context which we'll now walk first. It seems then that we should be
calculating the saved specs from scratch each time, rather than
potentially walking the saved specs list from an earlier partial
instantiation when calling build_extra_args a second time around.
PR c++/114303
gcc/cp/ChangeLog:
* constraint.cc (tsubst_requires_expr): Clear
REQUIRES_EXPR_EXTRA_ARGS before calling build_extra_args.
* pt.cc (tree_extra_args): Define.
(extract_locals_r): Assert *_EXTRA_ARGS is empty.
(tsubst_stmt) <case IF_STMT>: Clear IF_SCOPE on the new
IF_STMT. Call build_extra_args on the new IF_STMT instead
of t which might already have IF_STMT_EXTRA_ARGS.
Patrick Palka [Thu, 12 Sep 2024 16:45:03 +0000 (12:45 -0400)]
c++: decltype(auto) deduction of statement-expression [PR116418]
r8-7538 for PR84968 made strip_typedefs_expr diagnose STATEMENT_LIST
so that we reject statement-expressions in noexcept-specifiers to
match our behavior in template arguments (which the parser diagnoses
directly).
Later r11-7452 made decltype(auto) deduction canonicalize the expression
(as an implementation detail) which in turn calls strip_typedefs_expr,
and so ever since we inadvertently reject decltype(auto) deduction of a
statement-expression.
This patch just removes the diagnostic in strip_typedefs_expr and instead
treats statement-expressions similar to lambda-expressions. The function
doesn't seem like the right place for such a diagnostic and so it seems
easier to just accept rather than try to reject them in a suitable place.
PR c++/116418
gcc/cp/ChangeLog:
* tree.cc (strip_typedefs_expr) <case STATEMENT_LIST>: Replace
this error path with ...
<case STMT_EXPR>: ... this, returning the original tree.
gcc/testsuite/ChangeLog:
* g++.dg/eh/pr84968.C: No longer expect an ahead of time diagnostic
for the statement-expresssion. Instantiate the template and expect
an incomplete type error instead.
* g++.dg/ext/stmtexpr26.C: New test.
Richard Earnshaw [Thu, 20 Mar 2025 14:42:59 +0000 (14:42 +0000)]
opcodes: fix wrong code in expand_binop_directly [PR117811]
If expand_binop_directly fails to add a REG_EQUAL note it tries to
unwind and restart. But it can unwind too far if expand_binop changed
some of the operands before calling it. We don't need to unwind that
far anyway since we should end up taking exactly the same route next
time, just without a target rtx.
To fix this we remove LAST from the argument list and let the callers
(all in expand_binop) do their own unwinding if the call fails.
Instead we unwind just as far as the entry to expand_binop_directly
and recurse within this function instead of all the way back up.
gcc/ChangeLog:
PR middle-end/117811
* optabs.cc (expand_binop_directly): Remove LAST as an argument,
instead record the last insn on entry. Only delete insns if
we need to restart and restart by calling ourself, not expand_binop.
(expand_binop): Update callers to expand_binop_directly. If it
fails to expand the operation, delete back to LAST.
gcc/testsuite:
PR middle-end/117811
* gcc.dg/torture/pr117811.c: New test.
Jakub Jelinek [Thu, 17 Apr 2025 08:57:18 +0000 (10:57 +0200)]
s390: Use match_scratch instead of scratch in define_split [PR119834]
The following testcase ICEs since r15-1579 (addition of late combiner),
because *clrmem_short can't be split.
The problem is that the define_insn uses
(use (match_operand 1 "nonmemory_operand" "n,a,a,a"))
(use (match_operand 2 "immediate_operand" "X,R,X,X"))
(clobber (match_scratch:P 3 "=X,X,X,&a"))
and define_split assumed that if operands[1] is const_int_operand,
match_scratch will be always scratch, and it will be reg only if
it was the last alternative where operands[1] is a reg.
The pattern doesn't guarantee it though, of course RA will not try to
uselessly assign a reg there if it is not needed, but during RA
on the testcase below we match the last alternative, but then comes
late combiner and propagates const_int 3 into operands[1]. And that
matches fine, match_scratch matches either scratch or reg and the constraint
in that case is X for the first variant, so still just fine. But we won't
split that because the splitters only expect scratch.
The following patch fixes it by using match_scratch instead of scratch,
so that it accepts either.
2025-04-17 Jakub Jelinek <jakub@redhat.com>
PR target/119834
* config/s390/s390.md (define_split after *cpymem_short): Use
(clobber (match_scratch N)) instead of (clobber (scratch)). Use
(match_dup 4) and operands[4] instead of (match_dup 3) and operands[3]
in the last of those.
(define_split after *clrmem_short): Use (clobber (match_scratch N))
instead of (clobber (scratch)).
(define_split after *cmpmem_short): Likewise.
Jakub Jelinek [Wed, 28 Feb 2024 22:20:13 +0000 (23:20 +0100)]
c++: Fix explicit instantiation of const variable templates after earlier implicit instantation [PR113976]
Already previously instantiated const variable templates had
cp_apply_type_quals_to_decl called when they were instantiated,
but if they need runtime initialization, their TREE_READONLY flag
has been subsequently cleared.
Explicit variable template instantiation calls grokdeclarator which
calls cp_apply_type_quals_to_decl on them again, setting TREE_READONLY
flag again, but nothing clears it afterwards, so we emit such
instantiations into rodata sections and segfault when the dynamic
initialization attempts to initialize them.
The following patch fixes that by not calling cp_apply_type_quals_to_decl
on already instantiated variable declarations.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
Patrick Palka <ppalka@redhat.com>
Jakub Jelinek [Mon, 10 Feb 2025 09:40:22 +0000 (10:40 +0100)]
i386: Change RTL representation of bt[lq] [PR118623]
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
[(set (pc)
(if_then_else (match_operator 0 "bt_comparison_operator"
[(zero_extract:SWI48
(match_operand:SWI48 1 "nonimmediate_operand")
(const_int 1)
(match_operand:QI 2 "nonmemory_operand"))
(const_int 0)])
(label_ref (match_operand 3))
(pc)))
(clobber (reg:CC FLAGS_REG))]
"(TARGET_USE_BT || optimize_function_for_size_p (cfun))
&& (CONST_INT_P (operands[2])
? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
&& INTVAL (operands[2])
>= (optimize_function_for_size_p (cfun) ? 8 : 32))
: !memory_operand (operands[1], <MODE>mode))
&& ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extract:SWI48
(match_dup 1)
(const_int 1)
(match_dup 2))
(const_int 0)))
(set (pc)
(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
(label_ref (match_dup 3))
(pc)))]
{
operands[0] = shallow_copy_rtx (operands[0]);
PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set. These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it. If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.
So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly. If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U. The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.
2025-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/118623
* config/i386/i386.md (*bt<mode>): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*jcc_bt<mode>): Likewise. Use LTU and GEU as flags test
instead of EQ and NE.
(*jcc_bt<mode>_1): Likewise.
(*jcc_bt<mode>_mask): Likewise.
(Help combine recognize bt followed by cmov splitter): Likewise.
(*bt<mode>_setcqi): Likewise.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setnc<mode>): Likewise.
Jakub Jelinek [Thu, 5 Dec 2024 12:01:21 +0000 (13:01 +0100)]
doloop: Fix up doloop df use [PR116799]
The following testcases are miscompiled on s390x-linux, because the
doloop_optimize
/* Ensure that the new sequence doesn't clobber a register that
is live at the end of the block. */
{
bitmap modified = BITMAP_ALLOC (NULL);
for (rtx_insn *i = doloop_seq; i != NULL; i = NEXT_INSN (i))
note_stores (i, record_reg_sets, modified);
basic_block loop_end = desc->out_edge->src;
bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified);
check doesn't work as intended.
The problem is that it uses df, but the df analysis was only done using
iv_analysis_loop_init (loop);
->
df_analyze_loop (loop);
which computes df inside on the bbs of the loop.
While loop_end bb is inside of the loop, df_get_live_out computed that
way includes registers set in the loop and used at the start of the next
iteration, but doesn't include registers set in the loop (or before the
loop) and used after the loop.
The following patch fixes that by doing whole function df_analyze first,
changes the loop iteration mode from 0 to LI_ONLY_INNERMOST (on many
targets which use can_use_doloop_if_innermost target hook a so are known
to only handle innermost loops) or LI_FROM_INNERMOST (I think only bfin
actually allows non-innermost loops) and checking not just
df_get_live_out (loop_end) (that is needed for something used by the
next iteration), but also df_get_live_in (desc->out_edge->dest),
i.e. what will be used after the loop. df of such a bb shouldn't
be affected by the df_analyze_loop and so should be from df_analyze
of the whole function.
2024-12-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/113994
PR rtl-optimization/116799
* loop-doloop.cc: Include targhooks.h.
(doloop_optimize): Also punt on intersection of modified
with df_get_live_in (desc->out_edge->dest).
(doloop_optimize_loops): Call df_analyze. Use
LI_ONLY_INNERMOST or LI_FROM_INNERMOST instead of 0 as
second loops_list argument.
* gcc.c-torture/execute/pr116799.c: New test.
* g++.dg/torture/pr113994.C: New test.
Jakub Jelinek [Thu, 12 Sep 2024 16:22:21 +0000 (18:22 +0200)]
c++: Disable deprecated/unavailable diagnostics when creating thunks for methods with such attributes [PR116636]
On the following testcase, we emit false positive warnings/errors about using
the deprecated or unavailable methods when creating thunks for them, even
when nothing (in the testcase so far) actually used those.
The following patch temporarily disables that diagnostics when creating
the thunks.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
PR c++/116636
* method.cc: Include decl.h.
(use_thunk): Temporarily change deprecated_state to
UNAVAILABLE_DEPRECATED_SUPPRESS.
Jakub Jelinek [Tue, 6 Feb 2024 12:00:04 +0000 (13:00 +0100)]
asan: Don't fold some strlens with -fsanitize=address [PR110676]
The UB on the following testcase isn't diagnosed by -fsanitize=address,
because we see that the array has a single element and optimize the
strlen to 0. I think it is fine to assume e.g. for range purposes the
lower bound for the strlen as long as we don't try to optimize
strlen (str)
where we know that it returns [26, 42] to
26 + strlen (str + 26), but for the upper bound we really want to punt
on optimizing that for -fsanitize=address to read all the bytes of the
string and diagnose if we run to object end etc.
2024-02-06 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/110676
* gimple-fold.cc (gimple_fold_builtin_strlen): For -fsanitize=address
reset maxlen to sizetype maximum.
Jakub Jelinek [Tue, 13 May 2025 12:20:22 +0000 (14:20 +0200)]
libfortran: Fix up _gfortran_{,m,s}findloc2_s{1,4} [PR120196]
As mentioned in the PR, _gfortran_{,m,s}findloc2_s{1,4} iterate too many
times in the back case if nothing is found.
For !back, the loops are for (i = 1; i <= extent; i++) so i is in the
body [1, extent] if nothing is found, but for back it is
for (i = extent; i >= 0; i--) so i is in the body [0, extent] and compares
one element before the start of the array.
Note, findloc1_s{1,4} uses
for (n = len; n > 0; n--, src -= delta * len_array)
for the back loop and
for (n = 1; n <= len; n++, src += delta * len_array)
for !back. This patch fixes that.
The testcase fails under valgrind without the libgfortran changes and
succeeds with those.
2025-05-13 Jakub Jelinek <jakub@redhat.com>
PR libfortran/120196
* m4/ifindloc2.m4 (header1, header2): For back use i > 0 rather than
i >= 0 as for condition.
* generated/findloc2_s1.c: Regenerate.
* generated/findloc2_s4.c: Regenerate.
Jakub Jelinek [Tue, 13 May 2025 12:19:25 +0000 (14:19 +0200)]
libfortran: Fix up _gfortran_s{max,min}loc1_{4,8,16}_s{1,4} [PR120191]
There is a bug in _gfortran_s{max,min}loc1_{4,8,16}_s{1,4} which the
following testcase shows.
The functions return but then crash in the caller.
Seems that is because buffer overflows, I believe those functions for
if (mask == NULL || *mask) condition being false are supposed to fill in
the result array with all zeros (or allocate it and fill it with zeros).
My understanding is the result array in that case is integer(kind={4,8,16})
and should have the extents the character input array has.
The problem is that it uses * string_len in the extent multiplication:
extent[n] = GFC_DESCRIPTOR_EXTENT(array,n) * string_len;
and
extent[n] =
GFC_DESCRIPTOR_EXTENT(array,n + 1) * string_len;
which is I guess fine and desirable for the extents of the character array,
but not for the extents of the destination array. Yet the code uses
that extent array for that purpose (and no other purposes).
Here it uses it to set the dimensions for the case where it needs to
allocate (as well as size):
for (n = 0; n < rank; n++)
{
if (n == 0)
str = 1;
else
str = GFC_DESCRIPTOR_STRIDE(retarray,n-1) * extent[n-1];
GFC_DIMENSION_SET(retarray->dim[n], 0, extent[n] - 1, str);
}
Here it uses it for bounds checking of the destination:
if (unlikely (compile_options.bounds_check))
{
for (n=0; n < rank; n++)
{
index_type ret_extent;
ret_extent = GFC_DESCRIPTOR_EXTENT(retarray,n);
if (extent[n] != ret_extent)
runtime_error ("Incorrect extent in return value of"
" MAXLOC intrinsic in dimension %ld:"
" is %ld, should be %ld", (long int) n + 1,
(long int) ret_extent, (long int) extent[n]);
}
}
and here to find out how many retarray elements to actually fill in each
dimension:
while(1)
{
*dest = 0;
count[0]++;
dest += dstride[0];
n = 0;
while (count[n] == extent[n])
{
/* When we get to the end of a dimension, reset it and increment
the next dimension. */
count[n] = 0;
/* We could precalculate these products, but this is a less
frequently used path so probably not worth it. */
dest -= dstride[n] * extent[n];
Seems maxloc1s.m4 and minloc1s.m4 are the only users of ifunction-s.m4,
so we can change SCALAR_ARRAY_FUNCTION in there without breaking anything
else.
Jakub Jelinek [Tue, 13 May 2025 12:18:10 +0000 (14:18 +0200)]
libfortran: Fix up _gfortran_s{max,min}loc2_{4,8,16}_s{1,4} [PR120191]
I've tried to write a testcase for the BT_CHARACTER maxloc/minloc with named
or unnamed arguments and indeed the just posted patch fixed the arguments
in there in multiple cases to match what the library expects.
But the testcase still fails, due to library problems.
One dealt with in this patch are _gfortran_s{max,min}loc2_{4,8,16}_s{1,4}
functions. Those are trivial wrappers around
_gfortrani_{max,min}loc2_{4,8,16}_s{1,4} which should call those functions
if the scalar mask is true and just return 0 otherwise.
The two bugs I see there is that the back, len arguments are swapped,
which means that it always acts as back=.true. and for len will use
character length of 1 or 0 instead of the desired one.
The _gfortrani_{max,min}loc2_{4,8,16}_s{1,4} functions have prototypes like
GFC_INTEGER_4
maxloc2_4_s1 (gfc_array_s1 * const restrict array, GFC_LOGICAL_4 back, gfc_charlen_type len)
so back comes before len, ditto for the
GFC_INTEGER_4
smaxloc2_4_s1 (gfc_array_s1 * const restrict array,
GFC_LOGICAL_4 *mask, GFC_LOGICAL_4 back, gfc_charlen_type len)
The other problem is that it was just testing if (mask). In my limited
Fortran understanding that means that the optional argument mask was
supplied but nothing about its actual value. Other scalar mask generated
routines use if (mask == NULL || *mask) as the condition when to call the
non-masked function, i.e. when mask is not supplied (then it should act like
.true. mask) or when it is supplied and evaluates to .true.).
2025-05-13 Jakub Jelinek <jakub@redhat.com>
PR fortran/120191
* m4/maxloc2s.m4: For smaxloc2 call maxloc2 if mask is NULL or *mask.
Swap back and len arguments.
* m4/minloc2s.m4: Likewise.
* generated/maxloc2_4_s1.c: Regenerate.
* generated/maxloc2_4_s4.c: Regenerate.
* generated/maxloc2_8_s1.c: Regenerate.
* generated/maxloc2_8_s4.c: Regenerate.
* generated/maxloc2_16_s1.c: Regenerate.
* generated/maxloc2_16_s4.c: Regenerate.
* generated/minloc2_4_s1.c: Regenerate.
* generated/minloc2_4_s4.c: Regenerate.
* generated/minloc2_8_s1.c: Regenerate.
* generated/minloc2_8_s4.c: Regenerate.
* generated/minloc2_16_s1.c: Regenerate.
* generated/minloc2_16_s4.c: Regenerate.
Jakub Jelinek [Tue, 13 May 2025 12:14:55 +0000 (14:14 +0200)]
fortran: Fix up minloc/maxloc lowering [PR120191]
We need to drop the kind argument from what is passed to the
library, but need to do it not only when one uses the argument name
for it (so kind=4 etc.) but also when one passes all the arguments
to the intrinsics.
The following patch uses what gfc_conv_intrinsic_findloc uses,
which looks more efficient and cleaner, we already set automatic
vars to point to the kind and back actual arguments, so we can just
free/clear expr on the former and set name to "%VAL" on the latter.
And similarly clears dim argument for the BT_CHARACTER case when using
maxloc2/minloc2, again regardless of whether it was named or not.
2025-05-13 Jakub Jelinek <jakub@redhat.com>
Daniil Kochergin <daniil2472s@gmail.com>
Tobias Burnus <tburnus@baylibre.com>
PR fortran/120191
* trans-intrinsic.cc (strip_kind_from_actual): Remove.
(gfc_conv_intrinsic_minmaxloc): Don't call strip_kind_from_actual.
Free and clear kind_arg->expr if non-NULL. Set back_arg->name to
"%VAL" instead of a loop looking for last argument. Remove actual
variable, use array_arg instead. Free and clear dim_arg->expr if
non-NULL for BT_CHARACTER cases instead of using a loop.
Jakub Jelinek [Tue, 22 Apr 2025 19:27:28 +0000 (21:27 +0200)]
rs6000: Ignore OPTION_MASK_SAVE_TOC_INDIRECT differences in inlining decisions [PR119327]
The following testcase FAILs because the always_inline function can't
be inlined.
The rs6000 backend has similarly to other targets a hook which rejects
inlining which would bring in new ISAs which aren't there in the caller.
And this hook rejects this because of OPTION_MASK_SAVE_TOC_INDIRECT
differences.
This flag is set if explicitly requested or by default depending on
whether the current function looks hot (or at least not cold):
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
&& flag_shrink_wrap_separate
&& optimize_function_for_speed_p (cfun))
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
The target nodes that are being compared here are actually the default
target node (which was created when cfun was NULL) vs. one that was
created for the always_inline function when it wasn't NULL, so one
doesn't have it, the other does.
In any case, this flag feels like a tuning decision rather than hard
ISA requirement and I see no problems why we couldn't inline
even explicit -msave-toc-indirect function into -mno-save-toc-indirect
or vice versa.
We already ignore OPTION_MASK_P{8,10}_FUSION which are also more
like tuning flags.
2025-04-22 Jakub Jelinek <jakub@redhat.com>
PR target/119327
* config/rs6000/rs6000.cc (rs6000_can_inline_p): Ignore also
OPTION_MASK_SAVE_TOC_INDIRECT differences.
Jakub Jelinek [Wed, 16 Apr 2025 15:22:49 +0000 (17:22 +0200)]
libatomic: Fix up libat_{,un}lock_n for mingw [PR119796]
Here is just a port of the previously posted patch to mingw which
clearly has the same problems.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/101075
PR libgcc/119796
* config/mingw/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.
Jakub Jelinek [Wed, 16 Apr 2025 15:21:39 +0000 (17:21 +0200)]
libatomic: Fix up libat_{,un}lock_n [PR119796]
As mentioned in the PR (and I think in PR101075 too), we can run into
deadlock with libat_lock_n calls with larger n.
As mentioned in PR66842, we use multiple locks (normally 64 mutexes
for each 64 byte cache line in 4KiB page) and currently can lock more
than one lock, in particular for n [0, 64] a single lock, for n [65, 128]
2 locks, for n [129, 192] 3 locks etc.
There are two problems with this:
1) we can deadlock if there is some wrap-around, because the locks are
acquired always in the order from addr_hash (ptr) up to
locks[NLOCKS-1].mutex and then if needed from locks[0].mutex onwards;
so if e.g. 2 threads perform libat_lock_n with n = 2048+64, in one
case at pointer starting at page boundary and in another case at
page boundary + 2048 bytes, the first thread can lock the first
32 mutexes, the second thread can lock the last 32 mutexes and
then first thread wait for the lock 32 held by second thread and
second thread wait for the lock 0 held by the first thread;
fixed below by always locking the locks in order of increasing
index, if there is a wrap-around, by locking in 2 loops, first
locking some locks at the start of the array and second at the
end of it
2) the number of locks seems to be determined solely depending on the
n value, I think that is wrong, we don't know the structure alignment
on the libatomic side, it could very well be 1 byte aligned struct,
and so how many cachelines are actually (partly or fully) occupied
by the atomic access depends not just on the size, but also on
ptr % WATCH_SIZE, e.g. 2 byte structure at address page_boundary+63
should IMHO lock 2 locks because it occupies the first and second
cacheline
Note, before this patch it locked exactly one lock for n = 0, while
with this patch it could lock either no locks at all (if it is at cacheline
boundary) or 1 (otherwise).
Dunno of libatomic APIs can be called for zero sizes and whether
we actually care that much how many mutexes are locked in that case,
because one can't actually read/write anything into zero sized memory.
If you think it is important, I could add else if (nlocks == 0) nlocks = 1;
in both spots.
2025-04-16 Jakub Jelinek <jakub@redhat.com>
PR libgcc/101075
PR libgcc/119796
* config/posix/lock.c (libat_lock_n, libat_unlock_n): Start with
computing how many locks will be needed and take into account
((uintptr_t)ptr % WATCH_SIZE). If some locks from the end of the
locks array and others from the start of it will be needed, first
lock the ones from the start followed by ones from the end.
Jakub Jelinek [Mon, 14 Apr 2025 17:34:22 +0000 (19:34 +0200)]
expmed: Always use QImode for init_expmed set_zero_cost [PR119785]
This is a regression on some targets introduced I believe by r6-2055
which added mode argument to set_src_cost.
The problem here is that in the first iteration, mode is always QImode
and we get as -Os zero cost set_src_cost (const0_rtx, QImode, false).
But then we use the mode variable for iterating over int, partial int
and vector int modes, so for the second iteration we call set_src_cost
with mode which is at that time (machine_mode) (MAX_MODE_VECTOR_INT + 1).
In the x86 case that happens to be V2HFmode and we don't crash (and
compute the same 0 cost as we would for QImode).
But e.g. in the SPARC case (machine_mode) (MAX_MODE_VECTOR_INT + 1) is
MAX_MACHINE_MODE and that does all kinds of weird things especially
when doing ubsan bootstrap.
Fixed by always using QImode.
2025-04-14 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119785
* expmed.cc (init_expmed): Always pass QImode rather than mode to
set_src_cost passed to set_zero_cost.
Jakub Jelinek [Mon, 14 Apr 2025 08:18:13 +0000 (10:18 +0200)]
driver: On linux hosts disable ASLR during -freport-bug [PR119727]
Andi had a useful comment that even with the PR119727 workaround to
ignore differences in libbacktrace printed addresses, it is still better
to turn off ASLR when easily possible, e.g. in case some address leaks
in somewhere in the ICE message elsewhere, or to verify the ICE doesn't
depend on a particular library/binary load addresses.
The following patch adds a configure check and uses personality syscall
to turn off randomization for further -freport-bug subprocesses.
2025-04-14 Jakub Jelinek <jakub@redhat.com>
PR driver/119727
* configure.ac (HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE): New check.
* gcc.cc: Include sys/personality.h if
HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
(try_generate_repro): Call
personality (personality (0xffffffffU) | ADDR_NO_RANDOMIZE)
if HOST_HAS_PERSONALITY_ADDR_NO_RANDOMIZE is defined.
* config.in: Regenerate.
* configure: Regenerate.
Jakub Jelinek [Sat, 12 Apr 2025 11:15:13 +0000 (13:15 +0200)]
driver: Fix up -freport-bug for ASLR [PR119727]
With --enable-host-pie -freport-bug almost never prepares preprocessed
source and instead emits
The bug is not reproducible, so it is likely a hardware or OS problem.
message even for bogus which are 100% reproducible.
The way -freport-bug works is that it reruns it 3 times, capturing stdout
and stderr from each and then tries to compare the outputs in between
different runs.
The libbacktrace emitted hexadecimal addresses at the start of the lines
can differ between runs due to ASLR, either of the PIE executable, or
even if not PIE if there is some frame with e.g. libc function (say
crash in strlen/memcpy etc.).
The following patch fixes it by ignoring such differences at the start of
the lines.
2025-04-12 Jakub Jelinek <jakub@redhat.com>
PR driver/119727
* gcc.cc (files_equal_p): Rewritten using fopen/fgets/fclose instead
of open/fstat/read/close. At the start of lines, ignore lowercase
hexadecimal addresses followed by space.
Jakub Jelinek [Wed, 9 Apr 2025 20:01:30 +0000 (22:01 +0200)]
libquadmath: Fix up THREEp96 constant in expq
Here is a cherry-pick from glibc [BZ #32411] fix.
As mentioned by the reporter in a pull request against gcc-mirror,
the THREEp96 constant in e_expl.c is incorrect, it is actually 0x3.p+94f128
rather than 0x3.p+96f128.
The algorithm uses that to compute the t2 integer (tval2), by whose
delta it adjusts the x+xl pair and then in the result uses the precomputed
exp value for that entry.
Using 0x3.p+94f128 rather than 0x3.p+96f128 results in tval2 sometimes
being one smaller, sometimes one larger than the desired value, thus can mean
the x+xl pair after adjustment will be larger in absolute value than it
should be.
DesWursters created a test program for this
https://github.com/DesWurstes/comparefloats
and his results were
total: 1135000000 not_equal: 4322 earlier_score: 674 later_score: 3648
I've modified this so with
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c3
so that it actually tests pseudo-random _Float128 values with range
(-16384.,16384) with strong bias on values larger than 0.0002 in absolute
value (so that tval1/tval2 aren't zero most of the time) and that gave
total: 10000000000 not_equal: 29861 earlier_score: 4606 later_score: 25255
So, in both cases, in most cases the change doesn't result in any differences,
and in those rare cases where does, about 85% have smaller ulp than without
the patch.
Additionally I've tried
https://sourceware.org/bugzilla/show_bug.cgi?id=32411#c4
and in 2 billion iterations it didn't find any case where x+xl after the
adjustments without this change would be smaller in absolute value compared
to x+xl after the adjustments with this change.
Jakub Jelinek [Fri, 4 Apr 2025 18:57:09 +0000 (20:57 +0200)]
lto: lto-opts fixes [PR119625]
I can reproduce a really weird error in our distro i686 trunk gcc
(but haven't managed to reproduce it with vanilla trunk yet).
echo 'void foo (void) {}' > a.c; gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -o a.o -c a.c; gcc -O2 -flto=auto -m32 -march=i686 -r -o a.lo a.o
lto1: fatal error: open failed: No such file or directory
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
The error is because
cat ./a.lo.lto.o-args.0
""
a.o
My suspicion is that this "" in there is caused by weird .gnu.lto_.opts
section content during
gcc -O2 -flto=auto -m32 -march=i686 -ffat-lto-objects -fhardened -S -o a.s -c a.c
compilation (and I can reproduce that one with vanilla trunk).
The above results in
.section .gnu.lto_.opts,"e",@progbits
.string "'-fno-openmp' '-fno-openacc' '-fPIC' '' '-m32' '-march=i686' '-O2' '-flto=auto' '-ffat-lto-objects'"
There are two weird things, one (IMHO the cause of the "" later on) is
the '' part, I think it comes from lto_write_options doing
append_to_collect_gcc_options (&temporary_obstack, &first_p, "");
IMHO it shouldn't call append_to_collect_gcc_options at all for that case.
The -fhardened option causes global_options.x_flag_cf_protection
to be set to CF_FULL and later on the backend option processing
sets it to CF_FULL | CF_SET (i.e. 7, a value not handled in
lto_write_options).
The following patch fixes it by not emitting anything there if
flag_cf_protection is one of the unhandled values.
Perhaps it could incrementally use
switch (global_options.x_flag_cf_protection & ~CF_SET)
instead, dunno.
And the other problem is that the -fPIC in there is really weird.
Our distro compiler or vanilla configured trunk certainly doesn't
default to -fPIC and -fhardened uses -fPIE when
-fPIC/-fpic/-fno-pie/-fno-pic is not specified, so I was expecting
-fPIE in there.
The thing is that the -fpie option causes setting of both
global_options.x_flag_pi{c,e} to 1, -fPIE both to 2:
/* If -fPIE or -fpie is used, turn on PIC. */
if (opts->x_flag_pie)
opts->x_flag_pic = opts->x_flag_pie;
else if (opts->x_flag_pic == -1)
opts->x_flag_pic = 0;
if (opts->x_flag_pic && !opts->x_flag_pie)
opts->x_flag_shlib = 1;
so checking first for flag_pic == 2 and then flag_pic == 1
and only afterwards for flag_pie means we never print
-fPIE/-fpie.
Or do you want something further (like
switch (global_options.x_flag_cf_protection & ~CF_SET)
)?
2025-04-04 Jakub Jelinek <jakub@redhat.com>
PR lto/119625
* lto-opts.cc (lto_write_options): If neither flag_pic nor
flag_pie are set, check first for flag_pie and only later
for flag_pic rather than the other way around, use a temporary
variable. If flag_cf_protection is not set, don't append anything
if flag_cf_protection is none of CF_{NONE,FULL,BRANCH,RETURN} and
use a temporary variable.
Jakub Jelinek [Wed, 2 Apr 2025 17:28:20 +0000 (19:28 +0200)]
c: Fix ICEs with -fsanitize=pointer-{subtract,compare} [PR119582]
The following testcase ICEs because c_fully_fold isn't performed on the
arguments of __sanitizer_ptr_{sub,cmp} builtins and so e.g.
C_MAYBE_CONST_EXPR can leak into the gimplifier where it ICEs.
2025-04-02 Jakub Jelinek <jakub@redhat.com>
PR c/119582
* c-typeck.cc (pointer_diff, build_binary_op): Call c_fully_fold on
__sanitizer_ptr_sub or __sanitizer_ptr_cmp arguments.
Jakub Jelinek [Tue, 1 Apr 2025 14:40:55 +0000 (16:40 +0200)]
combine: Use reg_used_between_p rather than modified_between_p in two spots [PR119291]
The following testcase is miscompiled on x86_64-linux at -O2 by the combiner.
We have from earlier combinations
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":25:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (neg:SI (reg:SI 104 [ _7 ]))
(const_int 0 [0])))
(set (reg/v:SI 116 [ e ])
(neg:SI (reg:SI 104 [ _7 ])))
]) "pr119291.c":26:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(ne:DI (reg:CCZ 17 flags)
(const_int 0 [0]))) "pr119291.c":26:13 1447 {*setcc_di_1}
(expr_list:REG_DEAD (reg:CCZ 17 flags)
(nil)))
and try_combine is called on i3 25 and i2 22 (second time)
and reach the hunk being patched with simplified i3
(insn 25 24 26 4 (parallel [
(set (pc)
(pc))
(set (reg/v:SI 116 [ e ])
(const_int 0 [0]))
]) "pr119291.c":28:13 977 {*negsi_2}
(expr_list:REG_DEAD (reg:SI 104 [ _7 ])
(nil)))
and
(insn 22 21 23 4 (set (reg:SI 104 [ _7 ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
Now, the try_combine code there attempts to split two independent
sets in newpat by moving one of them to i2.
And among other tests it checks
!modified_between_p (SET_DEST (set1), i2, i3)
which is certainly needed, if there would be say
(set (reg/v:SI 116 [ e ]) (const_int 42 [0x2a]))
in between i2 and i3, we couldn't do that, as that set would overwrite
the value set by set1 we want to move to the i2 position.
But in this case pseudo 116 isn't set in between i2 and i3, but used
(and additionally there is a REG_DEAD note for it).
This is equally bad for the move, because while the i3 insn
and later will see the pseudo value that we set, the insn in between
which uses the value will see a different value from the one that
it should see.
As we don't check for that, in the end try_combine succeeds and
changes the IL to:
(insn 22 21 23 4 (set (reg/v:SI 116 [ e ])
(const_int 0 [0])) "pr119291.c":27:15 96 {*movsi_internal}
(nil))
(insn 23 22 24 4 (set (reg/v:SI 117 [ e ])
(reg/v:SI 116 [ e ])) 96 {*movsi_internal}
(expr_list:REG_DEAD (reg/v:SI 116 [ e ])
(nil)))
(note 24 23 25 4 NOTE_INSN_DELETED)
(insn 25 24 26 4 (set (pc)
(pc)) "pr119291.c":28:13 2147483647 {NOOP_MOVE}
(nil))
(note 26 25 27 4 NOTE_INSN_DELETED)
(insn 27 26 28 4 (set (reg:DI 128 [ _9 ])
(const_int 0 [0])) "pr119291.c":28:13 95 {*movdi_internal}
(nil))
(note, the i3 got turned into a nop and try_combine also modified insn 27).
The following patch replaces the modified_between_p
tests with reg_used_between_p, my understanding is that
modified_between_p is a subset of reg_used_between_p, so one
doesn't need both.
Looking at this some more today, I think we should special case
set_noop_p because that can be put into i2 (except for the JUMP_P
violations), currently both modified_between_p (pc_rtx, i2, i3)
and reg_used_between_p (pc_rtx, i2, i3) returns false.
I'll post a patch incrementally for that (but that feels like
new optimization, so probably not something that should be backported).
On Tue, Apr 01, 2025 at 11:27:25AM +0200, Richard Biener wrote:
> Can we constrain SET_DEST (set1/set0) to a REG_P in combine? Why
> does the comment talk about memory?
I was worried about making too risky changes this late in stage4
(and especially also for backports). Most of this code is 1992-ish.
I think many of the functions are just misnamed, the reg_ in there doesn't
match what those functions do (bet they initially supported just REGs
and later on support for other kinds of expressions was added, but haven't
done git archeology to prove that).
What we know for sure is:
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 0))) != STRICT_LOW_PART
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
&& GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
that is checked earlier in the condition.
Then it calls
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
While it has reg_* in it, that function mostly calls reg_overlap_mentioned_p
which is also misnamed, that function handles just fine all of
REG, MEM, SUBREG of REG, (SUBREG of MEM not, see below), ZERO_EXTRACT,
STRICT_LOW_PART, PC and even some further cases.
So, IMHO SET_DEST (set0) or SET_DEST (set0) can be certainly a REG, SUBREG
of REG, PC (at least the REG and PC cases are triggered on the testcase)
and quite possibly also MEM (SUBREG of MEM not, see below).
Now, the code uses !modified_between_p (SET_SRC (set{1,0}), i2, i3) where that
function for constants just returns false, for PC returns true, for REG
returns reg_set_between_p, for MEM recurses on the address, for
MEM_READONLY_P otherwise returns false, otherwise checks using alias.cc code
whether the memory could have been modified in between, for all other
rtxes recurses on the subrtxes. This part didn't change in my patch.
I've only changed those
- && !modified_between_p (SET_DEST (set{1,0}), i2, i3)
+ && !reg_used_between_p (SET_DEST (set{1,0}), i2, i3)
where the former has been described above and clearly handles all of
REG, SUBREG of REG, PC, MEM and SUBREG of MEM among other things.
The replacement reg_used_between_p calls reg_overlap_mentioned_p on each
instruction in between i2 and i3. So, there is clearly a difference
in behavior if SET_DEST (set{1,0}) is pc_rtx, in that case modified_between_p
returns unconditionally true even if there are no instructions in between,
but reg_used_between_p if there are no non-debug insns in between returns
false. Sorry for missing that, guess I should check for that (with the
exception of the noop moves which are often (set (pc) (pc)) and handled
by the incremental patch). In fact not just that, reg_used_between_p
will only return true for PC if it is mentioned anywhere in the insns
in between.
Anyway, except for that, for REG it calls refers_to_regno_p
and so should find any occurrences of any of the REG or parts of it for hard
registers, for MEM returns true if it sees any MEMs in insns in between
(conservatively), for SUBREGs apparently it relies on it being SUBREG of REG
(so doesn't handle SUBREG of MEM) and handles SUBREG of REG like the
SUBREG_REG, PC I've already described.
Now, because reg_overlap_mentioned_p doesn't handle SUBREG of MEM, I think
already the initial
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 1)),
XVECEXP (newpat, 0, 0))
&& ! reg_referenced_p (SET_DEST (XVECEXP (newpat, 0, 0)),
XVECEXP (newpat, 0, 1))
calls would have failed --enable-checking=rtl or would have misbehaved, so
I think there is no need to check for it further.
To your question why I don't use reg_referenced_p, that is because
reg_referenced_p is something to call on one insn pattern, while
reg_used_between_p is pretty much that on all insns in between two
instructions (excluding the boundaries).
So, I think it would be safer to add && SET_DEST (set{1,0} != pc_rtx
checks to preserve former behavior, like in the following version.
2025-04-01 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/119291
* combine.cc (try_combine): For splitting of PARALLEL with
2 independent SETs into i2 and i3 sets check reg_used_between_p
of the SET_DESTs rather than just modified_between_p.
Jakub Jelinek [Sat, 22 Mar 2025 07:39:38 +0000 (08:39 +0100)]
Fix up some further cases of missing or extraneous spaces in diagnostics
Given the recent PR119406 I've tried to grep for concatenated string
literals without space at the end of one line and at the start of next line,
unless it was obviously intentional.
Furthermore, I've then looked through gcc.pot looking for 2 adjacent spaces
and looking back if that wasn't the case of "something "
" with spaces at both sides".
Here is the result from that.
I think just the c.opt change needs an explanation, the "" in the
description is simply eaten up somewhere during the option processing and
gcc -v --help before this patch was displaying
-Wdeprecated-literal-operator Warn about deprecated space between and suffix in a user-defined literal operator.
2025-03-22 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.cc (warn_switch_unreachable_and_auto_init_r): Add missing
space in the middle of diagnostics.
Jakub Jelinek [Tue, 11 Mar 2025 10:01:55 +0000 (11:01 +0100)]
tree: Improve skip_simple_arithmetic [PR119183]
The following testcase takes very long time to compile, because
skip_simple_arithmetic decides to first call tree_invariant_p on
the second argument (and indirectly recurse there). I think before
canonicalization of operands for commutative binary expressions
(and for non-commutative ones always) it is pretty common that the
first operand is a constant, something which tree_invariant_p handles
immediately, so the following patch special cases that; I've added
there a tree_invariant_p call too after the checks, while it is not
really needed currently, tree_invariant_p has the same checks, I wanted
to be prepared in case tree_invariant_p changes. But if you think
I should avoid it, I can drop it too.
This is just a partial fix, I think one can certainly construct a testcase
which will still have horrible compile time complexity (but I've tried and
haven't managed to do so), so perhaps we should just limit the recursion
depth through skip_simple_arithmetic/tree_invariant_p with some defaulted
argument.
2025-03-11 Jakub Jelinek <jakub@redhat.com>
PR c/119183
* tree.cc (skip_simple_arithmetic): If first operand of binary
expr is TREE_CONSTANT or TREE_READONLY with no side-effects, call
tree_invariant_p on that operand first instead of on the second.
Jakub Jelinek [Thu, 6 Mar 2025 17:26:37 +0000 (18:26 +0100)]
c++: Update TYPE_FIELDS of variant types if cp_parser_late_parsing_default_args etc. modify it [PR98533]
The following testcases ICE during type verification, because TYPE_FIELDS
of e.g. S RECORD_TYPE in pr119123.C is different from TYPE_FIELDS of const S.
Various decls are added to S's TYPE_FIELDS first, then finish_struct
indirectly calls fixup_type_variants to sync the variant copies.
But later on cp_parser_class_specifier calls
cp_parser_late_parsing_default_args and that apparently adds a lambda
type (from default argument) to TYPE_FIELDS of S.
Dunno if that is right or not, assuming it is right, the following
patch fixes it by updating TYPE_FIELDS of variant types if there were
any changes in the various functions cp_parser_class_specifier defers and
calls on the outermost enclosing class.
There was quite a lot of code repetition already before, so the patch
uses a lambda to avoid the repetitions.
To my surprise, in some of the contract testcases (
g++.dg/contracts/contracts-friend1.C
g++.dg/contracts/contracts-nested-class1.C
g++.dg/contracts/contracts-nested-class2.C
g++.dg/contracts/contracts-redecl7.C
g++.dg/contracts/contracts-redecl8.C
) it is actually setting class_type and pushing TRANSLATION_UNIT_DECL
rather than some class types in some cases.
Or should the lambda pushing into the containing class be somehow avoided?
2025-03-06 Jakub Jelinek <jakub@redhat.com>
PR c++/98533
PR c++/119123
* parser.cc (cp_parser_class_specifier): Update TYPE_FIELDS of
variant types in case cp_parser_late_parsing_default_args etc. change
TYPE_FIELDS on the main variant. Add switch_to_class lambda and
use it to simplify repeated class switching code.
* g++.dg/cpp0x/pr98533.C: New test.
* g++.dg/cpp0x/pr119123.C: New test.
Jakub Jelinek [Tue, 25 Feb 2025 08:33:21 +0000 (09:33 +0100)]
openmp: Mark OpenMP atomic write expression as read [PR119000]
The following testcase was emitting false positive warning that
the rhs of #pragma omp atomic write was stored but not read,
when the atomic actually does read it. The following patch
fixes that by calling default_function_array_read_conversion
on it, so that it is marked as read as well as converted from
lvalue to rvalue.
Furthermore, the code had
if (code == NOP_EXPR) ... else ... if (code == NOP_EXPR) ...
with none of ... parts changing code, so I've merged the two ifs.
2025-02-25 Jakub Jelinek <jakub@redhat.com>
PR c/119000
* c-parser.cc (c_parser_omp_atomic): For omp write call
default_function_array_read_conversion on the rhs expression.
Merge the two adjacent if (code == NOP_EXPR) blocks.
Jakub Jelinek [Mon, 24 Feb 2025 11:19:16 +0000 (12:19 +0100)]
reassoc: Fix up optimize_range_tests_to_bit_test [PR118915]
The following testcase is miscompiled due to a bug in
optimize_range_tests_to_bit_test. It is trying to optimize
check for a in [-34,-34] or [-26,-26] or [-6,-6] or [-4,inf] ranges.
Another reassoc optimization folds the the test for the first
two ranges into (a + 34U) & ~8U in [0U,0U] range, and extract_bit_test_mask
actually has code to virtually undo it and treat that again as test
for a being -34 or -26. The problem is that optimize_range_tests_to_bit_test
remembers in the type variable TREE_TYPE (ranges[i].exp); from the first
range. If extract_bit_test_mask doesn't do that virtual undoing of the
BIT_AND_EXPR handling, that is just fine, the returned exp is ranges[i].exp.
But if the first range is BIT_AND_EXPR, the type could be different, the
BIT_AND_EXPR form has the optional cast to corresponding unsigned type
in order to avoid introducing UB. Now, type was used to fill in the
max value if ranges[j].high was missing in subsequently tested range,
and so in this particular testcase the [-4,inf] range which was
signed int and so [-4,INT_MAX] was treated as [-4,UINT_MAX] instead.
And we were subtracting values of 2 different types and trying to make
sense out of that.
The following patch fixes this by using the type of the low bound
(which is always non-NULL) for the max value of the high bound instead.
2025-02-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118915
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): For
highj == NULL_TREE use TYPE_MAX_VALUE (TREE_TYPE (lowj)) rather
than TYPE_MAX_VALUE (type).
Jakub Jelinek [Sat, 8 Feb 2025 07:54:31 +0000 (08:54 +0100)]
i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]
The following testcase ICEs starting with GCC 12 since r12-4526
although the bug has been introduced already in r12-2751.
The problem was in the addition of cond_<code><mode> define_expand
which uses nonimmediate_operand predicates for both maxmin operands
for all VI1248_AVX512VLBW modes. It works fine with
VI48_AVX512VL modes because the <code><mode>3_mask VI48_AVX512VL
define_expand uses ix86_fixup_binary_operands_no_copy and the
*avx512f_<code><mode>3<mask_name> VI48_AVX512VL define_insn uses
% in constraint and !(MEM_P && MEM_P) check in condition (and
<code><mode>3 define_expand with VI124_256_AVX512F_AVX512BW iterator
does that too), but eventhough the 8-bit and 16-bit element maxmin
is commutative too, the <mask_codefor><code><mode>3<mask_name>
define_insn with VI12_AVX512VL iterator didn't use % in constraint
to make it commutative. So, e.g. cond_umaxv32qi define_expand
allowed nonimmediate_operand for both umax operands, but used
gen_umaxv32qi_mask which wasn't commutative and only allowed
nonimmediate_operand for the second operand.
The following patch fixes it by keeping the <code><mode>3
VI124_256_AVX512F_AVX512BW define_expand as is (it does
ix86_fixup_binary_operands_no_copy) but extending the
<code><mode>3_mask define_expand from VI48_AVX512VL to
VI1248_AVX512VLBW which keeps the current modes with their
ISA conditions and adds the VI12_AVX512VL modes under additional
TARGET_AVX512BW condition, and turning the actual define_insn
into an * prefixed name (which it was before just for the non-masked
case) and having the same commutative operand handling as in other
define_insns.
2025-02-08 Jakub Jelinek <jakub@redhat.com>
PR target/118776
* config/i386/sse.md (<code><mode>3_mask): Use VI1248_AVX512VLBW
iterator rather than VI48_AVX512VL.
(<mask_codefor><code><mode>3<mask_name>): Rename to ...
(*avx512bw_<code><mode>3<mask_name>): ... this. Use
nonimmediate_operand rather than register_operand predicate and %v
rather than v constraint for operand 1 and adjust condition to reject
MEMs in both operand 1 and 2.
Jakub Jelinek [Fri, 7 Feb 2025 13:30:11 +0000 (14:30 +0100)]
c++: Don't use CLEANUP_EH_ONLY for new expression cleanup [PR118763]
The following testcase is miscompiled since r12-6325 stopped
preevaluating the initializers for new expression.
If evaluating the initializers throws, there is a correct cleanup
for that, but it is marked CLEANUP_EH_ONLY. While in standard
C++ that is just fine, if it has statement expressions, it can
return or goto out of the expression and we should delete the
pointer in that case too.
There is already a sentry variable initialized to true and
set to false after everything is initialized and used as a guard
for the cleanup, so just removing the CLEANUP_EH_ONLY flag does
everything we need. And in the normal case of the initializer
not using statement expressions at least with -O2 we get the same code,
while the change changes one
try { sentry = true; ... sentry = false; } catch { if (sentry) delete ...; }
into
try { sentry = true; ... sentry = false; } finally { if (sentry) delete ...; }
optimizations will see that sentry is false when reaching the finally
other than through an exception.
Though, wonder what other CLEANUP_EH_ONLY cleanups might be an issue
with statement expressions.
2025-02-07 Jakub Jelinek <jakub@redhat.com>
PR c++/118763
* init.cc (build_new_1): Don't set CLEANUP_EH_ONLY.
Jakub Jelinek [Fri, 7 Feb 2025 13:27:18 +0000 (14:27 +0100)]
c++: Allow constexpr reads from volatile std::nullptr_t objects [PR118661]
As mentioned in the PR, https://eel.is/c++draft/conv.lval#note-1
says that even volatile reads from std::nullptr_t typed objects actually
don't read anything and https://eel.is/c++draft/expr.const#10.9
says that even those are ok in constant expressions.
So, the following patch adjusts the r9-4793 changes to have an exception
for NULLPTR_TYPE.
As [conv.lval]/3 also talks about accessing to inactive member, I've added
testcase to cover that as well.
2025-02-07 Jakub Jelinek <jakub@redhat.com>
PR c++/118661
* constexpr.cc (potential_constant_expression_1): Don't diagnose
lvalue-to-rvalue conversion of volatile lvalue if it has NULLPTR_TYPE.
* decl2.cc (decl_maybe_constant_var_p): Return true for constexpr
decls with NULLPTR_TYPE even if they are volatile.
* g++.dg/cpp0x/constexpr-volatile4.C: New test.
* g++.dg/cpp0x/constexpr-union9.C: New test.
Jakub Jelinek [Fri, 31 Jan 2025 23:50:24 +0000 (00:50 +0100)]
icf: Compare call argument types in certain cases and asm operands [PR117432]
compare_operand uses operand_equal_p under the hood, which e.g. for
INTEGER_CSTs will just match the values rather regardless of their types.
Now, in many comparing the type is redundant, if we have
x_2 = y_3 + 1;
we've already compared the type for the lhs and also for rhs1, there won't
be any surprises on rhs2.
As noted in the PR, there are cases where the type of the operand is the
sole place of information and we don't want to ICF merge functions if the
types differ.
One case is stdarg functions, arguments passed to ..., it is different
if we pass 1, 1L, 1LL.
Another case are the K&R unprototyped functions (sure, gone in C23).
And yet another case are inline asm operands, "r" (1) is different from "r"
(1L) from "r" (1LL).
So, the following patch determines based on lack of fntype (e.g. for
internal functions), or on !prototype_p, or on stdarg_p (in that case
using number of named arguments) which arguments need to have type checked
and does that, plus compares types on inline asm operands (maybe it would be
enough to do that just for input operands but we have just a routine to
handle both and I didn't feel we need to differentiate).
Furthermore, I've noticed fntype{1,2} isn't actually compared if it is a
direct call (gimple_call_fndecl is non-NULL). That is wrong too, we could
have
void (*fn) (int, long long) = (void (*) (int, long long)) foo;
fn (1, 1LL);
in one case and
void (*fn) (long long, int) = (void (*) (long long, int)) foo;
fn (1LL, 1);
in another, both folded into a direct call of foo with different
gimple_call_fntype. Sure, one of them would be UB at runtime (or both), but
what if we ICF merge it into something that into the one UB at runtime
and the program actually calls the correct one only?
2025-02-01 Jakub Jelinek <jakub@redhat.com>
PR ipa/117432
* ipa-icf-gimple.cc (func_checker::compare_asm_inputs_outputs):
Also return_false if operands have incompatible types.
(func_checker::compare_gimple_call): Check fntype1 vs. fntype2
compatibility for all non-internal calls and assume fntype1 and
fntype2 are non-NULL for those. For calls to non-prototyped
calls or for stdarg_p functions after the last named argument (if any)
check type compatibility of call arguments.
* gcc.c-torture/execute/pr117432.c: New test.
* gcc.target/i386/pr117432.c: New test.
Arsen Arsenović [Wed, 29 Jan 2025 20:14:33 +0000 (21:14 +0100)]
d: give dependency files better filenames [PR118477]
Currently, the dependency files for root-file.o and common-file.o were
both d/.deps/file.Po, which would cause parallel builds to fail
sometimes with:
make[3]: Leaving directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
make[3]: Entering directory '/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/build/gcc'
mv: cannot stat 'd/.deps/file.TPo': No such file or directory
make[3]: *** [/var/tmp/portage/sys-devel/gcc-14.1.1_p20240511/work/gcc-14-20240511/gcc/d/Make-lang.in:421: d/root-file.o] Error 1 shuffle=131581365
Also, this means that dependencies of one of root-file or common-file
are missing when developing. After this patch, those two files get
assigned dependency files d/.deps/root-file.Po and
d/.deps/common-file.Po respectively, so match the actual object
files in the d/ subdirectory.
There are other files with similar conflicts (mangle-package.o,
visitor-package.o for instance).
2025-01-29 Arsen Arsenović <arsen@aarsen.me>
Jakub Jelinek <jakub@redhat.com>
PR d/118477
* Make-lang.in (DCOMPILE, DPOSTCOMPILE): Use $(basename $(@F))
instead of $(*F).
Jakub Jelinek [Sat, 25 Jan 2025 09:15:24 +0000 (10:15 +0100)]
c++: Only destruct elts of array for new expression if exception is thrown during the initialization [PR117827]
The following testcase r12-6328, because the elements of the array
are destructed twice, once when the callee encounters delete[] p;
and then second time when the exception is thrown.
The array elts should be only destructed if exception is thrown from
one of the constructors during the build_vec_init emitted code in case of
new expressions, but when the new expression completes, it is IMO
responsibility of user code to delete[] it when it is no longer needed.
So, the following patch uses the cleanup_flags argument to build_vec_init
to get notified of the flags that need to be changed when the expression
is complete and build_disable_temp_cleanup to do the changes.
2025-01-25 Jakub Jelinek <jakub@redhat.com>
PR c++/117827
* init.cc (build_new_1): Pass address of a make_tree_vector ()
initialized gc tree vector to build_vec_init and append
build_disable_temp_cleanup to init_expr from it.
Jakub Jelinek [Thu, 23 Jan 2025 10:11:23 +0000 (11:11 +0100)]
builtins: Store unspecified value to *exp for inf/nan [PR114877]
The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.
The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.
glibc and newlib store there 0, musl apparently doesn't store anything.
The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
arg = builtin_save_expr (arg);
return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.
* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.
Jakub Jelinek [Tue, 21 Jan 2025 23:18:24 +0000 (00:18 +0100)]
c++: Wrap force_target_expr in get_member_function_from_ptrfunc with save_expr [PR118509]
My October PR117259 fix to get_member_function_from_ptrfunc to use a
TARGET_EXPR rather than SAVE_EXPR unfortunately caused some regressions as
well as the following testcase shows.
What happens is that
get_member_function_from_ptrfunc -> build_base_path calls save_expr,
so since the PR117259 change in mnay cases it will call save_expr on
a TARGET_EXPR. And, for some strange reason a TARGET_EXPR is not considered
an invariant, so we get a SAVE_EXPR wrapped around the TARGET_EXPR.
That SAVE_EXPR <TARGET_EXPR <...>> gets initially added only to the second
operand of ?:, so at that point it would still work fine during expansion.
But unfortunately an expression with that subexpression is handed to the
caller also through *instance_ptrptr = instance_ptr; and gets evaluated
once again when computing the first argument to the method.
So, essentially, we end up with
(TARGET_EXPR <D.2907, ...>, (... ? ... SAVE_EXPR <TARGET_EXPR <D.2907, ...>
... : ...)) (... SAVE_EXPR <TARGET_EXPR <D.2907, ...> ..., ...);
and while D.2907 is initialized during gimplification in the code dominating
everything that uses it, the extra temporary created for the SAVE_EXPR
is initialized only conditionally (if the ?: condition is true) but then
used unconditionally, so we get
pmf-4.C: In function ‘void foo(C, B*)’:
pmf-4.C:12:11: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
12 | (y->*x) ();
| ~~~~~~~~^~
pmf-4.C:12:11: note: ‘<anonymous>’ was declared here
12 | (y->*x) ();
| ~~~~~~~~^~
diagnostic and wrong-code issue too.
As the trunk fix to just treat TARGET_EXPR as invariant seems a little bit risky
and I'd like to get it tested on the trunk for a while, for 14.2.1 this patch
instead wraps those TARGET_EXPRs into SAVE_EXPRs. Eventually that can be reverted
and the trunk fix backported.
2025-01-21 Jakub Jelinek <jakub@redhat.com>
PR c++/118509
* typeck.cc (get_member_function_from_ptrfunc): Wrap force_target_expr
with save_expr.
Jakub Jelinek [Wed, 8 Jan 2025 22:12:02 +0000 (23:12 +0100)]
c++: Honor complain in cp_build_function_call_vec for check_function_arguments warnings [PR117825]
The following testcase ICEs due to re-entering diagnostics.
When diagnosing -Wformat-security warning, we try to print instantiation
context, which calls tsubst with tf_none, but that in the end calls
cp_build_function_call_vec which calls check_function_arguments which
diagnoses another warning (again -Wformat-security).
The other check_function_arguments caller, build_over_call, doesn't call
that function if !(complain & tf_warning), so I think the best fix is
to do it the same in cp_build_function_call_vec as well.
2025-01-08 Jakub Jelinek <jakub@redhat.com>
PR c++/117825
* typeck.cc (cp_build_function_call_vec): Don't call
check_function_arguments if complain doesn't have tf_warning bit set.
Jakub Jelinek [Tue, 17 Dec 2024 09:13:24 +0000 (10:13 +0100)]
c++: Diagnose earlier non-static data members with cv containing class type [PR116108]
In r10-6457 aka PR92593 fix a check has been added to reject
earlier non-static data members with current_class_type in templates,
as the deduction then can result in endless recursion in reshape_init.
It fixed the
template <class T>
struct S { S s = 1; };
S t{2};
crashes, but as the following testcase shows, didn't catch when there
are cv qualifiers on the non-static data member.
Fixed by using TYPE_MAIN_VARIANT.
2024-12-17 Jakub Jelinek <jakub@redhat.com>
PR c++/116108
gcc/cp/
* decl.cc (grokdeclarator): Pass TYYPE_MAIN_VARIANT (type)
rather than type to same_type_p when checking if the non-static
data member doesn't have current class type.
gcc/testsuite/
* g++.dg/cpp1z/class-deduction117.C: New test.
Jakub Jelinek [Sat, 14 Dec 2024 10:27:20 +0000 (11:27 +0100)]
warn-access: Fix up matching_alloc_calls_p [PR118024]
The following testcase ICEs because of a bug in matching_alloc_calls_p.
The loop was apparently meant to be walking the two attribute chains
in lock-step, but doesn't really do that. If the first lookup_attribute
returns non-NULL, the second one is not done, so rmats in that case can
be some random unrelated attribute rather than "malloc" attribute; the
body assumes even rmats if non-NULL is "malloc" attribute and relies
on its argument to be a "malloc" argument and if it is some other
attribute with incompatible attribute, it just crashes.
Now, fixing that in the obvious way, instead of doing
(amats = lookup_attribute ("malloc", amats))
|| (rmats = lookup_attribute ("malloc", rmats))
in the condition do
((amats = lookup_attribute ("malloc", amats)),
(rmats = lookup_attribute ("malloc", rmats)),
(amats || rmats))
fixes the testcase but regresses Wmismatched-dealloc-{2,3}.c tests.
The problem is that walking the attribute lists in a lock-step is obviously
a very bad idea, there is no requirement that the same deallocators are
present in the same order on both decls, e.g. there could be an extra malloc
attribute without argument in just one of the lists, or the order of say
free/realloc could be swapped, etc. We don't generally document nor enforce
any particular ordering of attributes (even when for some attributes we just
handle the first one rather than all).
So, this patch instead simply splits it into two loops, the first one walks
alloc_decl attributes, the second one walks dealloc_decl attributes.
If the malloc attribute argument is a built-in, that doesn't change
anything, and otherwise we have the chance to populate the whole
common_deallocs hash_set in the first loop and then can check it in the
second one (and don't need to use more expensive add method on it, can just
check contains there). Not to mention that it also fixes the case when
the function would incorrectly return true if there wasn't a common
deallocator between the two, but dealloc_decl had 2 malloc attributes with
the same deallocator.
2024-12-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/118024
* gimple-ssa-warn-access.cc (matching_alloc_calls_p): Walk malloc
attributes of alloc_decl and dealloc_decl in separate loops rather
than in lock-step. Use common_deallocs.contains rather than
common_deallocs.add in the second loop.
Jakub Jelinek [Thu, 28 Nov 2024 13:31:44 +0000 (14:31 +0100)]
docs: Fix up __sync_* documentation [PR117642]
The PR14311 commit which added support for __sync_* builtins documented that
there is a warning if a particular operation cannot be implemented.
But that commit nor anything later on implemented such warning, it was
always silent generation of the mentioned calls (which can in most cases
result in linker errors of course because those functions aren't implemented
anywhere, in libatomic or elsewhere in code shipped in gcc).
So, the following patch just adjust the documentation to match the
implementation.
2024-11-28 Jakub Jelinek <jakub@redhat.com>
PR target/117642
* doc/extend.texi: Remove documentation of warning for unimplemented
__sync_* operations, such warning has never been implemented.
Jakub Jelinek [Wed, 27 Nov 2024 16:29:28 +0000 (17:29 +0100)]
c: Fix sizeof error recovery [PR117745]
Compilation of the following testcase hangs forever after emitting first
error. The problem is that in one place we just return error_mark_node
directly rather than going through c_expr_sizeof_expr or c_expr_sizeof_type.
The parsing of the expression could have called record_maybe_used_decl
though, but nothing calls pop_maybe_used which needs to be called after
parsing of every sizeof/typeof, successful or not.
At the end of the toplevel declaration we free the parser_obstack and in
another function record_maybe_used_decl is called again and due to the
missing pop_maybe_unused we end up with a cycle in the chain.
The following patch fixes it by just setting error and goto to the
sizeof_expr:
c_inhibit_evaluation_warnings--;
in_sizeof--;
mark_exp_read (expr.value);
if (TREE_CODE (expr.value) == COMPONENT_REF
&& DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
error_at (expr_loc, "%<sizeof%> applied to a bit-field");
result = c_expr_sizeof_expr (expr_loc, expr);
where c_expr_sizeof_expr will do:
struct c_expr ret;
if (expr.value == error_mark_node)
{
ret.value = error_mark_node;
ret.original_code = ERROR_MARK;
ret.original_type = NULL;
ret.m_decimal = 0;
pop_maybe_used (false);
}
...
return ret;
which is exactly what the old code did manually except for the missing
pop_maybe_used call. mark_exp_read does nothing on error_mark_node and
error_mark_node doesn't have COMPONENT_REF tree_code.
2024-11-27 Jakub Jelinek <jakub@redhat.com>
PR c/117745
* c-parser.cc (c_parser_sizeof_expression): If type_name is NULL,
just expr.set_error () and goto sizeof_expr instead of doing error
recovery manually.
Jakub Jelinek [Tue, 26 Nov 2024 08:46:51 +0000 (09:46 +0100)]
builtins: Fix up DFP ICEs on __builtin_fpclassify [PR102674]
This patch is similar to the one I've just posted, __builtin_fpclassify also
needs to print decimal float minimum differently and use real_from_string3.
Plus I've done some formatting fixes.
2024-11-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/102674
* builtins.cc (fold_builtin_fpclassify): Use real_from_string3 rather
than real_from_string. Use "1E%d" format string rather than "0x1p%d"
for decimal float minimum. Formatting fixes.
Jakub Jelinek [Tue, 26 Nov 2024 08:45:21 +0000 (09:45 +0100)]
builtins: Fix up DFP ICEs on __builtin_is{inf,finite,normal} [PR43374]
__builtin_is{inf,finite,normal} builtins ICE on _Decimal{32,64,128,64x}
operands unless those operands are constant.
The problem is that we fold the builtins to comparisons with the largest
finite number, but
a) get_max_float was only handling binary floats
b) real_from_string again assumes binary float
and so we were ICEing in the build_real called after the two calls.
This patch adds decimal handling into get_max_float (well, moves it
from c-cppbuiltin.cc which was printing those for __DEC{32,64,128}_MAX__
macros) and uses real_from_string3 (perhaps it is time to rename it
to just real_from_string now that we can use function overloading)
so that it handles both binary and decimal floats.
2024-11-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/43374
gcc/
* real.cc (get_max_float): Handle decimal float.
* builtins.cc (fold_builtin_interclass_mathfn): Use
real_from_string3 rather than real_from_string. Use
"1E%d" format string rather than "0x1p%d" for decimal
float minimum.
gcc/c-family/
* c-cppbuiltin.cc (builtin_define_decimal_float_constants): Use
get_max_float.
gcc/testsuite/
* gcc.dg/dfp/pr43374.c: New test.
Jakub Jelinek [Fri, 8 Nov 2024 12:36:05 +0000 (13:36 +0100)]
c++: Fix ICE on constexpr virtual function [PR117317]
Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.
The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.
2024-11-08 Jakub Jelinek <jakub@redhat.com>
PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.
* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.
Jakub Jelinek [Wed, 6 Nov 2024 09:21:09 +0000 (10:21 +0100)]
store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]
encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
it has to allocate a buffer and do various extra work like shifting the bits
etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
doesn't have corresponding integer mode.
The last case is explained later in the comments:
/* The native_encode_expr machinery uses TYPE_MODE to determine how many
bytes to write. This means it can write more than
ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
write 8 bytes for a bitlen of 40). Skip the bytes that are not within
bitlen and zero out the bits that are not relevant as well (that may
contain a sign bit due to sign-extension). */
Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
so that case doesn't need those fancy games unless bitlen or bitpos
aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
possible).
The following patch makes us use the fast path even for empty_ctor_p
which occupy full bytes, we can just memset that in the provided buffer and
don't need to XALLOCAVEC another buffer.
This patch in itself fixes the testcase from the PR (which was about using
huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
next patch.
2024-11-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117439
* gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
integral mode.
Jakub Jelinek [Wed, 30 Oct 2024 08:59:22 +0000 (09:59 +0100)]
function: Call do_pending_stack_adjust in assign_parms [PR117296]
Functions called by assign_parms call emit_block_move in two places,
so on some targets can be expanded as calls and can result in pending
stack adjustment.
Now, during expansion we normally call do_pending_stack_adjust at the end
of expansion of each basic block or before emitting code that will branch
and/or has labels, and when emitting labels we assert that there are no
pending stack adjustments.
assign_parms is expanded before the first basic block and if the first
basic block starts with a label and at least one of those emit_block_move
calls resulted in the need of pending stack adjustments, we ICE when
emitting that label.
The following patch fixes that by calling do_pending_stack_adjust after
after the assign_parms potential emit_block_move calls.
Jakub Jelinek [Fri, 25 Oct 2024 12:09:42 +0000 (14:09 +0200)]
Assorted --disable-checking fixes [PR117249]
We have currently 3 different definitions of gcc_assert macro, one used most
of the time (unless --disable-checking) which evaluates the condition at
runtime and also checks it at runtime, then one for --disable-checking GCC 4.5+
which looks like
((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
and a fallback one
((void)(0 && (EXPR)))
Now, the last one actually doesn't evaluate any of the side-effects in the
argument, just quiets up unused var/parameter warnings.
I've tried to replace the middle definition with
({ [[assume (EXPR)]]; (void) 0; })
for compilers which support assume attribute and statement expressions
(surprisingly quite a few spots use gcc_assert inside of comma expressions),
but ran into PR117287, so for now such a change isn't being proposed.
The following patch attempts to move important side-effects from gcc_assert
arguments.
Bootstrapped/regtested on x86_64-linux and i686-linux with normal
--enable-checking=yes,rtl,extra, plus additionally I've attempted to do
x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
((void)(0 && (EXPR)))
version when --disable-checking. That version ran into spurious middle-end
warnings
../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too large [-Werror=alloca-larger-than=]
../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro ‘XALLOCAVEC’
int op_num = ops.length ();
int op_normal_num = op_num;
gcc_assert (op_num > 0);
int stmt_num = op_num - 1;
gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
where we have gcc_assert exactly to work-around middle-end warnings.
Guess I'd need to also disable -Werror for this experiment, which actually
isn't a problem with unmodified system.h, because even for
--disable-checking we use the __builtin_unreachable at least in
stage2/stage3 and so the warnings aren't emitted, and even if it used
[[assume ()]]; it would work too because in stage2/stage3 we could again
rely on assume and statement expression support.
2024-10-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/117249
* tree-ssa-structalias.cc (insert_vi_for_tree): Move put calls out of
gcc_assert.
* lto-cgraph.cc (lto_symtab_encoder_delete_node): Likewise.
* gimple-ssa-strength-reduction.cc (get_alternative_base,
add_cand_for_stmt): Likewise.
* tree-eh.cc (add_stmt_to_eh_lp_fn): Likewise.
* except.cc (duplicate_eh_regions_1): Likewise.
* tree-ssa-reassoc.cc (insert_operand_rank): Likewise.
* config/nvptx/nvptx.cc (nvptx_expand_call): Use == rather than = in
gcc_assert.
* genautomata.cc (output_default_latencies): Move j++ side-effect
outside of gcc_assert.
* tree-ssa-loop-ivopts.cc (get_alias_ptr_type_for_ptr_address): Use
== rather than = in gcc_assert.
* cgraph.cc (symbol_table::create_edge): Move ++edges_max_uid
side-effect outside of gcc_assert.
Jakub Jelinek [Thu, 24 Oct 2024 10:56:19 +0000 (12:56 +0200)]
c++: Further fix for get_member_function_from_ptrfunc [PR117259]
The following testcase shows that the previous get_member_function_from_ptrfunc
changes weren't sufficient and we still have cases where
-fsanitize=undefined with pointers to member functions can cause wrong code
being generated and related false positive warnings.
The problem is that save_expr doesn't always create SAVE_EXPR, it can skip
some invariant arithmetics and in the end it could be really large
expressions which would be evaluated several times (and what is worse, with
-fsanitize=undefined those expressions then can have SAVE_EXPRs added to
their subparts for -fsanitize=bounds or -fsanitize=null or
-fsanitize=alignment instrumentation). Tried to just build1 a SAVE_EXPR
+ add TREE_SIDE_EFFECTS instead of save_expr, but that doesn't work either,
because cp_fold happily optimizes those SAVE_EXPRs away when it sees
SAVE_EXPR operand is tree_invariant_p.
So, the following patch instead of using save_expr or building SAVE_EXPR
manually builds a TARGET_EXPR. Both types are pointers, so it doesn't need
to be destroyed in any way, but TARGET_EXPR is what doesn't get optimized
away immediately.
2024-10-24 Jakub Jelinek <jakub@redhat.com>
PR c++/117259
* typeck.cc (get_member_function_from_ptrfunc): Use force_target_expr
rather than save_expr for instance_ptr and function. Don't call it
for TREE_CONSTANT.
Jakub Jelinek [Tue, 22 Oct 2024 18:30:41 +0000 (20:30 +0200)]
c-family: Fix up -Wsizeof-pointer-memaccess ICEs [PR117230]
In the following testcases, we ICE on all 4 function calls.
The problem is using TYPE_PRECISION on vector types (but guess it
would be similarly problematic on structures/unions/arrays).
The test only differentiates between suggestion what to do, whether
to supply explicit size because sizeof (*p) for
{,{,un}signed }char *p is not very likely what the user want, or
dereferencing the pointer, so I think limiting that suggestion
to integral types is ok.
2024-10-22 Jakub Jelinek <jakub@redhat.com>
PR c/117230
* c-warn.cc (sizeof_pointer_memaccess_warning): Only compare
TYPE_PRECISION of TREE_TYPE (type) to precision of char if
TREE_TYPE (type) is integral type.
* c-c++-common/Wsizeof-pointer-memaccess5.c: New test.
Jakub Jelinek [Fri, 20 Sep 2024 07:14:29 +0000 (09:14 +0200)]
i386: Fix up _mm_min_ss etc. handling of zeros and NaNs [PR116738]
min/max patterns for intrinsics which on x86 result in the second
input operand if the two operands are both zeros or one or both of them
are a NaN shouldn't use SMIN/SMAX RTL, because that is similarly to
MIN_EXPR/MAX_EXPR undefined what will be the result in those cases.
The following patch adds an expander which uses either a new pattern with
UNSPEC_IEEE_M{AX,IN} or use the S{MIN,MAX} representation of the same.
2024-09-20 Uros Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/116738
* config/i386/subst.md (mask_scalar_operand_arg34,
mask_scalar_expand_op3, round_saeonly_scalar_mask_arg3): New
subst attributes.
* config/i386/sse.md
(<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
Change from define_insn to define_expand, rename the old define_insn
to ...
(*<sse>_vm<code><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
... this.
(<sse>_ieee_vm<ieee_maxmin><mode>3<mask_scalar_name><round_saeonly_scalar_name>):
New define_insn.
Another spot where we mark_used a function (in this case ctor or dtor)
even when it is just artificially used inside of thunks (emitted on mingw
with -Os for the testcase).
2024-09-13 Jakub Jelinek <jakub@redhat.com>
PR c++/116678
* optimize.cc: Include decl.h.
(maybe_thunk_body): Temporarily change deprecated_state to
UNAVAILABLE_DEPRECATED_SUPPRESS.
Jakub Jelinek [Thu, 18 Jul 2024 07:22:10 +0000 (09:22 +0200)]
testsuite: Fix up builtin-clear-padding-3.c for -funsigned-char
As reported on gcc-regression, this test FAILs on aarch64, but my
r15-2090 change didn't change anything on the generated assembly,
just added the forgotten dg-do run directive to the test, so the
test has been failing forever, just we didn't know it.
I can actually reproduce it on x86_64 with -funsigned-char too,
s2.b.a has int type and -1 is stored to it, so we should compare
it against -1 rather than (char) -1; the latter is appropriate for
testing char fields into which we've stored -1.
2024-07-18 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/torture/builtin-clear-padding-3.c (main): Compare
s2.b.a against -1 rather than (char) -1.
Jakub Jelinek [Tue, 10 Sep 2024 16:32:58 +0000 (18:32 +0200)]
c++: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]
The following testcase is miscompiled, because
get_member_function_from_ptrfunc
emits something like
(((FUNCTION.__pfn & 1) != 0)
? ptr + FUNCTION.__delta + FUNCTION.__pfn - 1
: FUNCTION.__pfn) (ptr + FUNCTION.__delta, ...)
or so, so FUNCTION tree is used there 5 times. There is
if (TREE_SIDE_EFFECTS (function)) function = save_expr (function);
but in this case function doesn't have side-effects, just nested ARRAY_REFs.
Now, if all the FUNCTION trees would be shared, it would work fine,
FUNCTION is evaluated in the first operand of COND_EXPR; but unfortunately
that isn't the case, both the BIT_AND_EXPR shortening and conversion to
bool done for build_conditional_expr actually unshare_expr that first
expression, but none of the other 4 are unshared. With -fsanitize=bounds,
.UBSAN_BOUNDS calls are added to the ARRAY_REFs and use save_expr to avoid
evaluating the argument multiple times, but because that FUNCTION tree is
first used in the second argument of COND_EXPR (i.e. conditionally), the
SAVE_EXPR initialization is done just there and then the third argument
of COND_EXPR just uses the uninitialized temporary and so does the first
argument computation as well.
The following patch fixes that by doing save_expr even if !TREE_SIDE_EFFECTS,
but to avoid doing that too often only if !nonvirtual and if the expression
isn't a simple decl.
2024-09-10 Jakub Jelinek <jakub@redhat.com>
PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Use save_expr
on instance_ptr and function even if it doesn't have side-effects,
as long as it isn't a decl.
Jakub Jelinek [Sat, 7 Sep 2024 07:36:53 +0000 (09:36 +0200)]
libiberty: Fix up > 64K section handling in simple_object_elf_copy_lto_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
Jakub Jelinek [Fri, 9 Aug 2024 12:32:51 +0000 (14:32 +0200)]
i386: Fix up __builtin_ia32_b{extr{,i}_u{32,64},zhi_{s,d}i} folding [PR116287]
The GENERIC folding of these builtins have cases where it folds to a
constant regardless of the value of the first operand. If so, we need
to use omit_one_operand to avoid throwing away side-effects in the first
operand if any. The cases which verify the first argument is INTEGER_CST
don't need that, INTEGER_CST doesn't have side-effects.
2024-08-09 Jakub Jelinek <jakub@redhat.com>
PR target/116287
* config/i386/i386.cc (ix86_fold_builtin) <case IX86_BUILTIN_BEXTR32>:
When folding into zero without checking whether first argument is
constant, use omit_one_operand.
(ix86_fold_builtin) <case IX86_BUILTIN_BZHI32>: Likewise.
* gcc.target/i386/bmi-pr116287.c: New test.
* gcc.target/i386/bmi2-pr116287.c: New test.
* gcc.target/i386/tbm-pr116287.c: New test.
Jakub Jelinek [Wed, 24 Jul 2024 16:00:05 +0000 (18:00 +0200)]
testsuite: Fix up pr116034.c test for big/pdp endian [PR116061]
Didn't notice the memmove is into an int variable, so the test
was still failing on big endian.
2024-07-24 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/116034
PR testsuite/116061
* gcc.dg/pr116034.c (g): Change type from int to unsigned short.
(foo): Guard memmove call on __SIZEOF_SHORT__ == 2.
Jakub Jelinek [Tue, 23 Jul 2024 08:50:29 +0000 (10:50 +0200)]
ssa: Fix up maybe_rewrite_mem_ref_base complex type handling [PR116034]
The folding into REALPART_EXPR is correct, used only when the mem_offset
is zero, but for IMAGPART_EXPR it didn't check the exact offset value (just
that it is not 0).
The following patch fixes that by using IMAGPART_EXPR only if the offset
is right and using BITFIELD_REF or whatever else otherwise.
2024-07-23 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <quic_apinski@quicinc.com>
PR tree-optimization/116034
* tree-ssa.cc (maybe_rewrite_mem_ref_base): Only use IMAGPART_EXPR
if MEM_REF offset is equal to element type size.
Jakub Jelinek [Wed, 17 Jul 2024 09:38:33 +0000 (11:38 +0200)]
gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]
The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size. That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some. If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
> (unsigned HOST_WIDE_INT) buf->sz)
{
gcc_assert (wordsize > 1);
wordsize /= 2;
i -= wordsize;
continue;
}
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end. If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115527
* gimple-fold.cc (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.
Jonathan Wakely [Wed, 16 Oct 2024 08:22:37 +0000 (09:22 +0100)]
libstdc++: Fix Python deprecation warning in printers.py
python/libstdcxx/v6/printers.py:1355: DeprecationWarning: 'count' is passed as positional argument
The Python docs say:
Deprecated since version 3.13: Passing count and flags as positional
arguments is deprecated. In future Python versions they will be
keyword-only parameters.
Using a keyword argument for count only became possible with Python 3.1
so introduce a new function to do the substitution.
libstdc++-v3/ChangeLog:
* python/libstdcxx/v6/printers.py (strip_fundts_namespace): New.
(StdExpAnyPrinter, StdExpOptionalPrinter): Use it.
Jonathan Wakely [Wed, 28 May 2025 14:19:18 +0000 (15:19 +0100)]
libstdc++: Make system_clock::to_time_t always_inline [PR99832]
For some 32-bit targets Glibc supports changing the size of time_t to be
64 bits by defining _TIME_BITS=64. That causes an ABI change which
would affect std::chrono::system_clock::to_time_t. Because to_time_t is
not a function template, its mangled name does not depend on the return
type, so it has the same mangled name whether it returns a 32-bit time_t
or a 64-bit time_t. On targets where the size of time_t can be selected
at preprocessing time, that can cause ODR violations, e.g. the linker
selects a definition of to_time_t that returns a 32-bit value but a
caller expects 64-bit and so reads 32 bits of garbage from the stack.
This commit adds always_inline to to_time_t so that all callers inline
the conversion to time_t, and will do so using whatever type time_t
happens to be in that translation unit.
Existing objects compiled before this change will either have inlined
the function anyway (which is likely if compiled with any optimization
enabled) or will contain a COMDAT definition of the inline function and
so still be able to find it at link-time.
The attribute is also added to system_clock::from_time_t, because that's
an equally simple function and it seems reasonable for them to both be
always inlined.
libstdc++-v3/ChangeLog:
PR libstdc++/99832
* include/bits/chrono.h (system_clock::to_time_t): Add
always_inline attribute to be agnostic to the underlying type of
time_t.
(system_clock::from_time_t): Add always_inline for consistency
with to_time_t.
* testsuite/20_util/system_clock/99832.cc: New test.
Jonathan Wakely [Tue, 20 May 2025 09:53:41 +0000 (10:53 +0100)]
libstdc++: Fix incorrect links to archived SGI STL docs
In r8-7777-g25949ee33201f2 I updated some URLs to point to copies of the
SGI STL docs in the Wayback Machine, because the original pags were no
longer hosted on sgi.com. However, I incorrectly assumed that if one
archived page was at https://web.archive.org/web/20171225062613/... then
all the other pages would be too. Apparently that's not how the Wayback
Machine works, and each page is archived on a different date. That meant
that some of our links were redirecting to archived copies of the
announcement that the SGI STL docs have gone away.
This fixes each URL to refer to a correctly archived copy of the
original docs.
Martin Jambor [Wed, 14 May 2025 10:08:24 +0000 (12:08 +0200)]
tree-sra: Do not create stores into const aggregates (PR111873)
This patch fixes (hopefully the) one remaining place where gimple SRA
was still creating a load into const aggregates. It occurs when there
is a replacement for a load but that replacement is not type
compatible - typically because it is a single field structure.
I have used testcases from duplicates because the original test-case
no longer reproduces for me.
gcc/ChangeLog:
2025-05-13 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/111873
* tree-sra.cc (sra_modify_expr): When processing a load which has
a type-incompatible replacement, do not store the contents of the
replacement into the original aggregate when that aggregate is
const.
gcc/testsuite/ChangeLog:
2025-05-13 Martin Jambor <mjambor@suse.cz>
* gcc.dg/ipa/pr120044-1.c: New test.
* gcc.dg/ipa/pr120044-2.c: Likewise.
* gcc.dg/tree-ssa/pr114864.c: Likewise.