There were two uses for the Darwin host config fragment:
The first is to arrange for targets that support mdynamic-no-pic
to be built with that enabled (since it makes a significant
difference to the compiler performance). We can be more specific
in the application of this, since it only applies to 32b hosts
plus powerpc64-darwin9.
The second was to work around a tool bug where -fno-PIE was not
propagated to the link stage. This second use is redundant,
since the buggy toolchain cannot bootstrap current GCC sources
anyway.
This makes the host fragment more specific and reduces the number
of toolchains for which it is included which reduces clutter in
configure lines.
Iain Sandoe [Fri, 11 Dec 2020 00:29:42 +0000 (00:29 +0000)]
Darwin, PPC : Fix R13 for PPC64.
We have a somewhat unusual situation in that for PPC64, R13 is
both reserved and callee-saved (it is used internally by the
pthreads implementation to contain pthread_self).
So add R13 to the fixed regs, but also keep it in the callee-
saved set.
gcc/ChangeLog:
* config/rs6000/darwin.h (FIXED_R13): Add for PPC64.
(FIRST_SAVED_GP_REGNO): Save from R13 even when it is one
of the fixed regs.
Darwin: Future-proof and homogeneize detection of darwin versions
The current GCC branch will become 12.1.0, which will be the stable
version of GCC when the next macOS version is released. There are some
places in GCC that don’t handle darwin22 as a version, so we need to
future-proof it (gcc/config.gcc and gcc/config/darwin-driver.c). We
align that code with what Apple clang does, i.e. accept all potential
major macOS versions until 99.
This patch also homogenises the handling of darwin version numbers,
where the majority of places use darwin2*, but some used darwin2[0-9]*.
Since there never was a darwin2.x version, the two are equivalent, and
we prefer the simpler darwin2*
gcc/ChangeLog:
* config/darwin-driver.c: Make version code more future-proof.
* config.gcc: Homogeneize darwin versions.
* configure.ac: Homogeneize darwin versions.
* configure: Regenerate.
gcc/testsuite/ChangeLog:
* gcc.dg/darwin-minversion-link.c: Test darwin21.
* obj-c++.dg/cxx-ivars-3.mm: Homogeneize darwin versions.
* obj-c++.dg/objc-gc-3.mm: Homogeneize darwin versions.
* objc.dg/objc-gc-4.m: Homogeneize darwin versions.
Versions of the assembler using clang from XCode 12.5/12.5.1
have a bug which produces different code layout between debug and
non-debug input, leading to a compare fail for default configure
parameters.
This is a workaround fix to disable the optimisation that is
responsible for the bug.
* config.in: Regenerate.
* config/i386/darwin.h (EXTRA_ASM_OPTS): New
(ASM_SPEC): Pass options to disable branch shortening where
needed.
* configure: Regenerate.
* configure.ac: Detect versions of 'as' that support the
optimisation which has the bug.
Kewen Lin [Mon, 18 Apr 2022 02:34:51 +0000 (21:34 -0500)]
testsuite: Skip pr105250.c for powerpc and s390 [PR105266]
This test case pr105250.c is like its related pr105140.c, which
suffers the error with message like "{AltiVec,vector} argument
passed to unprototyped" on powerpc and s390. So like commits
r12-8025 and r12-8039, this fix is to add the dg-skip-if for
powerpc*-*-* and s390*-*-*.
gcc/testsuite/ChangeLog:
PR testsuite/105266
* gcc.dg/pr105250.c: Skip for powerpc*-*-* and s390*-*-*.
The following reverts the original PR105140 fix and goes for instead
applying the additional fold_convert constraint for VECTOR_TYPE
conversions also to fold_convertible_p. I did not try sanitizing
all of this at this point.
2022-04-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/105250
* fold-const.c (fold_convertible_p): Revert r12-7979-geaaf77dd85c333, instead check for size equality
of the vector types involved.
Andreas Krebbel [Thu, 7 Apr 2022 05:29:13 +0000 (07:29 +0200)]
IBM zSystems/testsuite: PR105147: Skip pr105140.c
pr105140.c fails on IBM zSystems with "vector argument passed to
unprototyped function". s390_invalid_arg_for_unprototyped_fn in
s390.cc is triggered by that.
gcc/testsuite/ChangeLog:
PR target/105147
* gcc.dg/pr105140.c: Skip for s390*-*-*.
This test fails with error "AltiVec argument passed to unprototyped
function", but the code (in rs6000.c:invalid_arg_for_unprototyped_fn,
from 2005) actually tests for any vector type argument. It also does
not fail on Darwin, not reflected here though.
Richard Biener [Mon, 4 Apr 2022 08:20:05 +0000 (10:20 +0200)]
middle-end/105140 - fix bogus recursion in fold_convertible_p
fold_convertible_p expects an operand and a type to convert to
but recurses with two vector component types. Fixed by allowing
types instead of an operand as well.
2022-04-04 Richard Biener <rguenther@suse.de>
PR middle-end/105140
* fold-const.c (fold_convertible_p): Allow a TYPE_P arg.
Richard Biener [Wed, 6 Apr 2022 09:43:01 +0000 (11:43 +0200)]
tree-optimization/105173 - fix insertion logic in reassoc
The find_insert_point logic around deciding whether to insert
before or after the found insertion point does not handle
the case of _12 = ..;, _12, 1.0 well. The following puts the
logic into find_insert_point itself instead.
2022-04-06 Richard Biener <rguenther@suse.de>
PR tree-optimization/105173
* tree-ssa-reassoc.c (find_insert_point): Get extra
insert_before output argument and compute it.
(insert_stmt_before_use): Adjust.
(rewrite_expr_tree): Likewise.
Richard Biener [Fri, 29 Apr 2022 06:45:48 +0000 (08:45 +0200)]
tree-optimization/105431 - another overflow in powi handling
This avoids undefined signed overflow when calling powi_as_mults_1.
2022-04-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/105431
* tree-ssa-math-opts.c (powi_as_mults_1): Make n unsigned.
(powi_as_mults): Use absu_hwi.
(gimple_expand_builtin_powi): Remove now pointless n != -n
check.
Jason Merrill [Tue, 26 Apr 2022 15:15:04 +0000 (11:15 -0400)]
c++: constexpr ref to array of array [PR102307]
The problem here is that first check_initializer calls
build_aggr_init_full_exprs, which does overload resolution, but then in the
case of failed constexpr throws away the result and does it again in
build_functional_cast. But in the first overload resolution,
reshape_init_array_1 decided to reuse the inner CONSTRUCTORs because
tf_error is set, so we know we're committed. But the second pass gets
confused by the CONSTRUCTORs with non-init-list types.
Fixed by avoiding a second pass: instead, pass the call from build_aggr_init
to build_cplus_new, which will turn it into a TARGET_EXPR. I don't bother
to change the object argument because it will be replaced later in
simplify_aggr_init_expr.
PR c++/102307
gcc/cp/ChangeLog:
* decl.c (check_initializer): Use build_cplus_new in case of
constexpr failure.
Paul A. Clarke [Mon, 23 May 2022 18:14:45 +0000 (13:14 -0500)]
rs6000: __Uglify non-uglified local variables in headers
Properly prefix (with "__") all local variables in shipped headers for x86
compatibility intrinsics implementations. This avoids possible problems with
usages like:
```
#define result foo()
#include <emmintrin.h>
```
libgcc/
PR target/105162
* config/aarch64/lse.S: Define BARRIER and handle memory MODEL 5.
* config/aarch64/t-lse: Add a 5th memory model for _sync functions.
Jason Merrill [Wed, 16 Jun 2021 20:09:59 +0000 (16:09 -0400)]
c++: static memfn from non-dependent base [PR101078]
After my patch for PR91706, or before that with the qualified call,
tsubst_baselink returned a BASELINK with BASELINK_BINFO indicating a base of
a still-dependent derived class. We need to look up the relevant base binfo
in the substituted class.
PR c++/101078
gcc/cp/ChangeLog:
* pt.c (tsubst_baselink): Update binfos in non-dependent case.
Jason Merrill [Thu, 14 Apr 2022 01:56:03 +0000 (21:56 -0400)]
c++: alignment of local typedef in template [PR65211]
Because common_handle_aligned_attribute only applies the alignment to the
TREE_TYPE of a typedef, not the DECL_ORIGINAL_TYPE, we need to copy it
explicitly in tsubst.
Jason Merrill [Wed, 13 Apr 2022 17:23:08 +0000 (13:23 -0400)]
c++: NRV and ref-extended temps [PR101442]
This issue goes back to r83221, where the cleanup for extended ref temps
changed from being unconditional to being tied to the declaration they
formed part of the initializer for.
The named return value optimization changes the cleanup for the NRV variable
to only run on the EH path; we don't want that change to affect temporary
cleanups. The perform_member_init change isn't necessary (there 'decl' is a
COMPONENT_REF), it's just for consistency.
Jason Merrill [Mon, 5 Apr 2021 03:32:32 +0000 (23:32 -0400)]
c++: extern template and static data member [PR99066]
'extern template' should mean that the relevant symbols are never emitted.
But in this case we were assuming that DECL_EXTERNAL was already set on the
variable, so we just needed to clear DECL_NOT_REALLY_EXTERN. Since
DECL_EXTERNAL was not set, we emitted a definition of npos.
gcc/cp/ChangeLog:
PR c++/99066
* pt.c (mark_decl_instantiated): Set DECL_EXTERNAL.
gcc/testsuite/ChangeLog:
PR c++/99066
* g++.dg/cpp0x/extern_template-6.C: New test.
Jason Merrill [Tue, 6 Apr 2021 02:50:44 +0000 (22:50 -0400)]
c++: mangling of lambdas in default args [PR91241]
In this testcase, the parms remembered in LAMBDA_EXPR_EXTRA_SCOPE are no
longer the parms of the FUNCTION_DECL they have as their DECL_CONTEXT, so we
were mangling both lambdas as parm #0. But since the parms are numbered
from right to left we don't need to need to find them in the FUNCTION_DECL,
we can measure their own DECL_CHAIN.
gcc/cp/ChangeLog:
PR c++/91241
* mangle.c (write_compact_number): Add sanity check.
(write_local_name): Use list_length for parm number.
gcc/testsuite/ChangeLog:
PR c++/91241
* g++.dg/abi/lambda-defarg1.C: New test.
Jason Merrill [Wed, 26 May 2021 21:38:42 +0000 (17:38 -0400)]
c++: argument pack with expansion [PR86355]
This testcase revealed that we were using PACK_EXPANSION_EXTRA_ARGS a lot
more than necessary; use_pack_expansion_extra_args_p meant to use it in the
case of corresponding arguments in different argument packs differing in
whether they are pack expansions, but it was mistakenly also returning true
for the case of a single argument pack containing both expansion and
non-expansion elements.
Surprisingly, just disabling that didn't lead to any regressions in the
testsuite; it seems other changes have prevented us getting to this point
for code that used to exercise it. So this patch limits the check to
arguments in the same position in the packs, and asserts that we never
actually see a mismatch.
PR c++/86355
gcc/cp/ChangeLog:
* pt.c (use_pack_expansion_extra_args_p): Don't compare
args from the same argument pack.
Jason Merrill [Sun, 27 Mar 2022 16:36:13 +0000 (12:36 -0400)]
c++: low -faligned-new [PR102071]
This test ICEd after the constexpr new patch (r10-3661) because alloc_call
had a NOP_EXPR around it; fixed by moving the NOP_EXPR to alloc_expr. And
the PR pointed out that the size_t cookie might need more alignment, so I
fix that as well.
PR c++/102071
gcc/cp/ChangeLog:
* init.c (build_new_1): Include cookie in alignment. Omit
constexpr wrapper from alloc_call.
Jason Merrill [Mon, 31 May 2021 16:36:25 +0000 (12:36 -0400)]
c++: missing dtor with -fno-elide-constructors [PR100838]
tf_no_cleanup only applies to the outermost TARGET_EXPR, and we already
clear it for nested calls in build_over_call, but in this case both
constructor calls came from convert_like, so we need to clear it in the
recursive call as well. This revealed that we were adding an extra
ck_rvalue in direct-initialization cases where it was wrong.
PR c++/100838
PR c++/105265
gcc/cp/ChangeLog:
* call.c (convert_like_internal): Clear tf_no_cleanup when
recursing.
(build_user_type_conversion_1): Only add ck_rvalue if
LOOKUP_ONLYCONVERTING.
gcc/testsuite/ChangeLog:
* g++.dg/init/no-elide2.C: New test.
* g++.dg/cpp0x/initlist-new6.C: New test.
Jason Merrill [Thu, 14 Apr 2022 12:16:45 +0000 (08:16 -0400)]
c++: lambda and the current instantiation [PR82980]
When a captured variable is type-dependent, we've expressed the type of the
capture field and proxy with a decltype variant. But if the type is "the
current instantiation", we need to be able to see that so that we can do
lookup inside it just like we could with the captured variable itself.
I also tried looking through lambda capture in
cp_parser_postfix_dot_deref_expression, but this way seems cleaner. I plan
to treat more types as deducible in stage 1.
I considered also using this in do_auto_deduction, but think that would be
wrong: [temp.dep.expr] says an id-expression is type-dependent if it is
"associated by name lookup with a variable declared with a type that
contains a placeholder type where the initializer is type-dependent". That
doesn't clearly exclude deducing a dependent type from the initializer, but
it seems like a barrier, and other implementations agree.
PR c++/82980
gcc/cp/ChangeLog:
* lambda.c (type_deducible_expression_p): New.
(lambda_capture_field_type): Check it.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-current-inst1.C: New test.
The constexpr constructor checking code got confused by the expansion of a
trivial copy constructor; we don't need to do that checking for defaulted
ctors, anyway.
PR c++/104646
gcc/cp/ChangeLog:
* constexpr.c (maybe_save_constexpr_fundef): Don't do extra
checks for defaulted ctors.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/constexpr-fno-elide-ctors1.C: New test.
Jason Merrill [Tue, 26 Apr 2022 04:19:40 +0000 (00:19 -0400)]
c++: pack init-capture of unresolved overload [PR102629]
Here we were failing to diagnose that the initializer for the capture pack
is an unresolved overload. It turns out that the reason we didn't recognize
the deduction failure in do_auto_deduction was that the individual 'auto' in
the expansion of the capture pack was still marked as a parameter pack, so
we were deducing it to an empty pack instead of failing.
PR c++/102629
gcc/cp/ChangeLog:
* pt.c (gen_elem_of_pack_expansion_instantiation): Clear
TEMPLATE_TYPE_PARAMETER_PACK on auto.
Jason Merrill [Mon, 24 Jan 2022 05:01:40 +0000 (00:01 -0500)]
c++: assignment to temporary [PR59950]
Given build_this of a TARGET_EXPR, cp_build_fold_indirect_ref returns the
TARGET_EXPR. But that's the wrong value category for the result of the
defaulted class assignment operator, which returns an lvalue, so we need to
actually build the INDIRECT_REF.
PR c++/59950
gcc/cp/ChangeLog:
* call.c (build_over_call): Use cp_build_indirect_ref.
Jason Merrill [Tue, 12 Apr 2022 21:46:59 +0000 (17:46 -0400)]
c++: empty base constexpr -fno-elide-ctors [PR105245]
The patch for 100111 extended our handling of empty base elision to the case
where the derived class has no other fields, but we still need to make sure
that there's some initializer for the derived object.
PR c++/105245
PR c++/100111
gcc/cp/ChangeLog:
* constexpr.c (cxx_eval_store_expression): Build a CONSTRUCTOR
as needed in empty base handling.
Jason Merrill [Thu, 7 Apr 2022 02:20:49 +0000 (22:20 -0400)]
c++: nested generic lambda in DMI [PR101717]
We were already checking COMPLETE_TYPE_P to recognize instantiation of a
generic lambda, but didn't consider that we might be nested in a non-generic
lambda.
PR c++/101717
gcc/cp/ChangeLog:
* lambda.c (lambda_expr_this_capture): Check all enclosing
lambdas for completeness.
Jason Merrill [Tue, 5 Apr 2022 20:02:04 +0000 (16:02 -0400)]
c++: -Wshadow=compatible-local type vs var [PR100608]
The patch for PR92024 changed -Wshadow=compatible-local to warn if either
new or old decl was a type, but the rationale only talked about the case
where both are types. If only one is, they aren't compatible.
PR c++/100608
gcc/cp/ChangeLog:
* name-lookup.c (check_local_shadow): Use -Wshadow=local
if exactly one of 'old' and 'decl' is a type.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wshadow-compatible-local-3.C: New test.
Jason Merrill [Mon, 11 Apr 2022 17:06:05 +0000 (13:06 -0400)]
c++: operator new lookup [PR98249]
The standard says, as we quote in the comment just above, that if we don't
find operator new in the allocated type, it should be looked up in the
global scope. This is specifically ::, not just any namespace, and we
already give an error for an operator new declared in any other namespace.
PR c++/98249
gcc/cp/ChangeLog:
* call.c (build_operator_new_call): Just look in ::.
Jason Merrill [Fri, 18 Mar 2022 18:36:19 +0000 (14:36 -0400)]
c++: designator and anon struct [PR101767]
We found .x in the anonymous struct, but then didn't find .y there; we
should decide that means we're done with the struct rather than that the
code is wrong.
PR c++/101767
gcc/cp/ChangeLog:
* decl.c (reshape_init_class): Back out of anon struct
if a designator doesn't match.
Patrick Palka [Thu, 1 Jul 2021 00:44:52 +0000 (20:44 -0400)]
c++: cxx_eval_array_reference and empty elem type [PR101194]
Here the initializer for x is represented as an empty CONSTRUCTOR due to
its empty element type. So during constexpr evaluation of the ARRAY_REF
x[0], we end up trying to value initialize the omitted element at index 0,
which fails because the element type is not default constructible.
This patch makes cxx_eval_array_reference specifically handle the case
where the element type is an empty type.
PR c++/101194
gcc/cp/ChangeLog:
* constexpr.c (cxx_eval_array_reference): When the element type
is an empty type and the corresponding element is omitted, just
return an empty CONSTRUCTOR instead of attempting value
initialization.
Jakub Jelinek [Wed, 27 Apr 2022 06:34:18 +0000 (08:34 +0200)]
asan: Fix up asan_redzone_buffer::emit_redzone_byte [PR105396]
On the following testcase, we have in main's frame 3 variables,
some red zone padding, 4 byte d, followed by 12 bytes of red zone padding, then
8 byte b followed by 24 bytes of red zone padding, then 40 bytes c followed
by some red zone padding.
The intended content of shadow memory for that is (note, each byte describes
8 bytes of memory):
f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
left red d mr b middle r c right red zone
f1 is left red zone magic
f2 is middle red zone magic
f3 is right red zone magic
00 when all 8 bytes are accessible
01-07 when only 1 to 7 bytes are accessible followed by inaccessible bytes
The -fdump-rtl-expand-details dump makes it clear that it misbehaves:
Flushing rzbuffer at offset -160 with: f1 f1 f1 f1
Flushing rzbuffer at offset -128 with: 04 f2 00 00
Flushing rzbuffer at offset -128 with: 00 00 00 f2
Flushing rzbuffer at offset -96 with: f2 f2 00 00
Flushing rzbuffer at offset -64 with: 00 00 00 f3
Flushing rzbuffer at offset -32 with: f3 f3 f3 f3
In the end we end up with
f1 f1 f1 f1 00 00 00 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3
shadow bytes because at offset -128 there are 2 overlapping stores
as asan_redzone_buffer::emit_redzone_byte has flushed the temporary 4 byte
buffer in the middle.
The function is called with an offset and value. If the passed offset is
consecutive with the prev_offset + buffer size (off == offset), then
we handle it correctly, similarly if the new offset is far enough from the
old one (we then flush whatever was in the buffer and if needed add up to 3
bytes of 00 before actually pushing value.
But what isn't handled correctly is when the offset isn't consecutive to
what has been added last time, but it is in the same 4 byte word of shadow
memory (32 bytes of actual memory), like the above case where
we have consecutive 04 f2 and then skip one shadow memory byte (aka 8 bytes
of real memory) and then want to emit f2. Emitting that as a store
of little-endian 0x0000f204 followed by a store of 0xf2000000 to the same
address doesn't work, we want to emit 0xf200f204.
The following patch does that by pushing 1 or 2 00 bytes.
Additionally, as a small cleanup, instead of using
m_shadow_bytes.safe_push (value);
flush_if_full ();
in all of if, else if and else bodies it sinks those 2 stmts to the end
of function as all do the same thing.
2022-04-27 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/105396
* asan.c (asan_redzone_buffer::emit_redzone_byte): Handle the case
where offset is bigger than off but smaller than m_prev_offset + 32
bits by pushing one or more 0 bytes. Sink the
m_shadow_bytes.safe_push (value); flush_if_full (); statements from
all cases to the end of the function.
Jakub Jelinek [Fri, 22 Apr 2022 11:38:11 +0000 (13:38 +0200)]
rtlanal: Fix up replace_rtx [PR105333]
The following testcase FAILs, because replace_rtx replaces a REG with
CONST_WIDE_INT inside of a SUBREG, which is an invalid transformation
because a SUBREG relies on SUBREG_REG having non-VOIDmode but
CONST_WIDE_INT has VOIDmode.
replace_rtx already has code to deal with it, but it was doing
it only for CONST_INTs. The following patch does it also for
VOIDmode CONST_DOUBLE or CONST_WIDE_INT.
2022-04-22 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/105333
* rtlanal.c (replace_rtx): Use simplify_subreg or
simplify_unary_operation if CONST_SCALAR_INT_P rather than just
CONST_INT_P.
Jakub Jelinek [Tue, 19 Apr 2022 16:58:59 +0000 (18:58 +0200)]
sparc: Preserve ORIGINAL_REGNO in epilogue_renumber [PR105257]
The following testcase ICEs, because the pic register is
(reg:DI 24 %i0 [109]) and is used in the delay slot of a return.
We invoke epilogue_renumber and that changes it to
(reg:DI 8 %o0) which no longer satisfies sparc_pic_register_p
predicate, so we don't recognize the insn anymore.
The following patch fixes that by preserving ORIGINAL_REGNO if
specified, so we get (reg:DI 8 %o0 [109]) instead.
2022-04-19 Jakub Jelinek <jakub@redhat.com>
PR target/105257
* config/sparc/sparc.c (epilogue_renumber): If ORIGINAL_REGNO,
use gen_raw_REG instead of gen_rtx_REG and copy over also
ORIGINAL_REGNO. Use return 0; instead of /* fallthrough */.
Jakub Jelinek [Tue, 19 Apr 2022 16:27:41 +0000 (18:27 +0200)]
c++: Fix up CONSTRUCTOR_PLACEHOLDER_BOUNDARY handling [PR105256]
The CONSTRUCTOR_PLACEHOLDER_BOUNDARY bit is supposed to separate
PLACEHOLDER_EXPRs that should be replaced by one object or subobjects of it
(variable, TARGET_EXPR slot, ...) from other PLACEHOLDER_EXPRs that should
be replaced by different objects or subobjects.
The bit is set when finding PLACEHOLDER_EXPRs inside of a CONSTRUCTOR, not
looking into nested CONSTRUCTOR_PLACEHOLDER_BOUNDARY ctors, and we prevent
elision of TARGET_EXPRs (through TARGET_EXPR_NO_ELIDE) whose initializer
is a CONSTRUCTOR_PLACEHOLDER_BOUNDARY ctor. The following testcase ICEs
though, we don't replace the placeholders in there at all, because
CONSTRUCTOR_PLACEHOLDER_BOUNDARY isn't set on the TARGET_EXPR_INITIAL
ctor, but on a ctor nested in such a ctor. replace_placeholders should be
run on the whole TARGET_EXPR slot.
So, the following patch fixes it by moving the CONSTRUCTOR_PLACEHOLDER_BOUNDARY
bit from nested CONSTRUCTORs to the CONSTRUCTOR containing those (but only
if it is closely nested, if there is some other tree sandwiched in between,
it doesn't do it).
2022-04-19 Jakub Jelinek <jakub@redhat.com>
PR c++/105256
* typeck2.c (process_init_constructor_array,
process_init_constructor_record, process_init_constructor_union): Move
CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag from CONSTRUCTOR elements to the
containing CONSTRUCTOR.
Jakub Jelinek [Tue, 12 Apr 2022 07:19:11 +0000 (09:19 +0200)]
i386: Fix ICE caused by ix86_emit_i387_log1p [PR105214]
The following testcase ICEs, because ix86_emit_i387_log1p attempts to
emit something like
if (cond)
some_code1;
else
some_code2;
and emits a conditional jump using emit_jump_insn (standard way in
the file) and an unconditional jump using emit_jump.
The problem with that is that if there is pending stack adjustment,
it isn't emitted before the conditional jump, but is before the
unconditional jump and therefore stack is adjusted only conditionally
(at the end of some_code1 above), which makes dwarf2 pass unhappy about it
but is a serious wrong-code even if it doesn't ICE.
This can be fixed either by emitting pending stack adjust before the
conditional jump as the following patch does, or by not using
emit_jump (label2);
and instead hand inlining what that function does except for the
pending stack adjustment, like:
emit_jump_insn (targetm.gen_jump (label2));
emit_barrier ();
In that case there will be no stack adjustment in the sequence and
it will be done later on somewhere else.
Jakub Jelinek [Tue, 12 Apr 2022 07:16:06 +0000 (09:16 +0200)]
builtins: Fix up expand_builtin_int_roundingfn_2 [PR105211]
The expansion of __builtin_iround{,f,l} etc. builtins in some cases
emits calls to a different fallback builtin. To locate the right builtin
it uses mathfn_built_in_1 with the type of the first argument.
If its TYPE_MAIN_VARIANT is {float,double,long_double}_type_node, all is
fine, but on the following testcase, because GIMPLE considers scalar
float conversions between types with the same mode as useless,
TYPE_MAIN_VARIANT of the arg's type is float32_type_node and because there
isn't __builtin_lroundf32 returns NULL and we ICE.
This patch will first try the type of the first argument of the builtin's
prototype (so that say on sizeof(double)==sizeof(long double) target it honors
whether it was a *l or non-*l call; though even that can't be 100% trusted,
user could incorrectly prototype it) and as fallback the type argument.
If neither works, doesn't fallback.
2022-04-11 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/105211
* builtins.c (expand_builtin_int_roundingfn_2): If mathfn_built_in_1
fails for TREE_TYPE (arg), retry it with
TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))) and if even that
fails, emit call normally.
Jakub Jelinek [Mon, 11 Apr 2022 08:41:07 +0000 (10:41 +0200)]
c-family: Initialize ridpointers for __int128 etc. [PR105186]
The following testcase ICEs with C++ and is incorrectly rejected with C.
The reason is that both FEs use ridpointers identifiers for CPP_KEYWORD
and value or u.value for CPP_NAME e.g. when parsing attributes or OpenMP
directives etc., like:
/* Save away the identifier that indicates which attribute
this is. */
identifier = (token->type == CPP_KEYWORD)
/* For keywords, use the canonical spelling, not the
parsed identifier. */
? ridpointers[(int) token->keyword]
: id_token->u.value;
identifier = canonicalize_attr_name (identifier);
I've tried to change those to use ridpointers only if non-NULL and otherwise
use the value/u.value even for CPP_KEYWORDS, but that was a large 10 hunks
patch.
The following patch instead just initializes ridpointers for the __intNN
keywords. It can't be done earlier before we record_builtin_type as there
are 2 different spellings and if we initialize those ridpointers early, the
second record_builtin_type fails miserably.
2022-04-11 Jakub Jelinek <jakub@redhat.com>
PR c++/105186
* c-common.c (c_common_nodes_and_builtins): After registering __int%d
and __int%d__ builtin types, initialize corresponding ridpointers
entry.
Jakub Jelinek [Fri, 8 Apr 2022 07:14:44 +0000 (09:14 +0200)]
fold-const: Fix up make_range_step [PR105189]
The following testcase is miscompiled, because fold_truth_andor
incorrectly folds
(unsigned) foo () >= 0U && 1
into
foo () >= 0
For the unsigned comparison (which is useless in this case,
as >= 0U is always true, but hasn't been folded yet), previous
make_range_step derives exp (unsigned) foo () and +[0U, -]
range for it. Next we process the NOP_EXPR. We have special code
for unsigned to signed casts, already earlier punt if low or high
aren't representable in arg0_type or if it is a narrowing conversion.
For the signed to unsigned casts, I think if high is specified we
are still fine, as we punt for non-representable values in arg0_type,
n_high is then still representable and so was smaller or equal to
signed maximum and either low is not present (equivalent to 0U), or
low must be smaller or equal to high and so for unsigned exp
+[low, high] the signed exp +[n_low, n_high] will be correct.
Similarly, if both low and high aren't specified (always true or
always false), it is ok too.
But if we have for unsigned exp +[low, -] or -[low, -], using
+[n_low, -] or -[n_high, -] is incorrect. Because low is smaller
or equal to signed maximum and high is unspecified (i.e. unsigned
maximum), when signed that range is a union of +[n_low, -] and
+[-, -1] which is equivalent to -[0, n_low-1], unless low
is 0, in that case we can treat it as [-, -].
2022-04-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/105189
* fold-const.c (make_range_step): Fix up handling of
(unsigned) x +[low, -] ranges for signed x if low fits into
typeof (x).
Jakub Jelinek [Wed, 6 Apr 2022 16:42:52 +0000 (18:42 +0200)]
combine: Don't record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]
The testcase in the PR fails under valgrind on mips64 (but only Martin
can reproduce, I couldn't).
But the problem reported there is that SUBST_MODE remembers addresses
into the regno_reg_rtx array, then some splitter needs a new pseudo
and calls gen_reg_rtx, which reallocates the regno_reg_rtx array
and finally undo operation is done and dereferences the old regno_reg_rtx
entry.
The rtx values stored in regno_reg_rtx array seems to be created
by gen_reg_rtx only and since then aren't modified, all we do for it
is adjusting its fields (e.g. adjust_reg_mode that SUBST_MODE does).
So, I think it is useless to use where.r for UNDO_MODE and store
®no_reg_rtx[regno] in struct undo, we can store just
regno_reg_rtx[regno] (i.e. pointer to the REG itself instead of
pointer to pointer to REG) or could also store just the regno.
The following patch does the latter, and because SUBST_MODE no longer
needs to be a macro, changes all SUBST_MODE uses to subst_mode.
2022-04-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104985
* combine.c (struct undo): Add where.regno member.
(do_SUBST_MODE): Rename to ...
(subst_mode): ... this. Change first argument from rtx * into int,
operate on regno_reg_rtx[regno] and save regno into where.regno.
(SUBST_MODE): Remove.
(try_combine): Use subst_mode instead of SUBST_MODE, change first
argument from regno_reg_rtx[whatever] to whatever. For UNDO_MODE, use
regno_reg_rtx[undo->where.regno] instead of *undo->where.r.
(undo_to_marker): For UNDO_MODE, use regno_reg_rtx[undo->where.regno]
instead of *undo->where.r.
(simplify_set): Use subst_mode instead of SUBST_MODE, change first
argument from regno_reg_rtx[whatever] to whatever.
Jakub Jelinek [Sun, 3 Apr 2022 19:50:43 +0000 (21:50 +0200)]
i386: Fix up ix86_expand_vector_init_general [PR105123]
The following testcase is miscompiled on ia32.
The problem is that at -O0 we end up with:
vector(4) short unsigned int _1;
short unsigned int u.0_3;
...
_1 = {u.0_3, u.0_3, u.0_3, u.0_3};
statement (dead) which is wrongly expanded.
elt is (subreg:HI (reg:SI 83 [ u.0_3 ]) 0), tmp_mode SImode,
so after convert_mode we start with word (reg:SI 83 [ u.0_3 ]).
The intent is to manually broadcast that value to 2 SImode parts,
but because we pass word as target to expand_simple_binop, it will
overwrite (reg:SI 83 [ u.0_3 ]) and we end up with 0:
10: {r83:SI=r83:SI<<0x10;clobber flags:CC;}
11: {r83:SI=r83:SI|r83:SI;clobber flags:CC;}
12: {r83:SI=r83:SI<<0x10;clobber flags:CC;}
13: {r83:SI=r83:SI|r83:SI;clobber flags:CC;}
14: clobber r110:V4HI
15: r110:V4HI#0=r83:SI
16: r110:V4HI#4=r83:SI
as the two ors do nothing and two shifts each by 16 left shift it all
away.
The following patch fixes that by using NULL_RTX target, so we expand it as
10: {r110:SI=r83:SI<<0x10;clobber flags:CC;}
11: {r111:SI=r110:SI|r83:SI;clobber flags:CC;}
12: {r112:SI=r83:SI<<0x10;clobber flags:CC;}
13: {r113:SI=r112:SI|r83:SI;clobber flags:CC;}
14: clobber r114:V4HI
15: r114:V4HI#0=r111:SI
16: r114:V4HI#4=r113:SI
instead.
Another possibility would be to pass NULL_RTX only when word == elt
and word otherwise, where word would necessarily be a pseudo from the first
shift after passing NULL_RTX there once or pass NULL_RTX for the shift and
word for ior.
2022-04-03 Jakub Jelinek <jakub@redhat.com>
PR target/105123
* config/i386/i386-expand.c (ix86_expand_vector_init_general): Avoid
using word as target for expand_simple_binop when doing ASHIFT and
IOR.
Jakub Jelinek [Wed, 30 Mar 2022 08:49:47 +0000 (10:49 +0200)]
ubsan: Fix ICE due to -fsanitize=object-size [PR105093]
The following testcase ICEs, because for a volatile X & RESULT_DECL
ubsan wants to take address of that reference. instrument_object_size
is called with x, so the base is equal to the access and the var
is automatic, so there is no risk of an out of bounds access for it.
Normally we wouldn't instrument those because we fold address of the
t - address of inner to 0, add constant size of the decl and it is
equal to what __builtin_object_size computes. But the volatile
results in the subtraction not being folded.
The first hunk fixes it by punting if we access the whole automatic
decl, so that even volatile won't cause a problem.
The second hunk (not strictly needed for this testcase) is similar
to what has been added to asan.cc recently, if we actually take
address of a decl and keep it in the IL, we better mark it addressable.
2022-03-30 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/105093
* ubsan.c (instrument_object_size): If t is equal to inner and
is a decl other than global var, punt. When emitting call to
UBSAN_OBJECT_SIZE ifn, make sure base is addressable.
Jakub Jelinek [Wed, 30 Mar 2022 08:21:16 +0000 (10:21 +0200)]
store-merging: Avoid ICEs on roughly ~0ULL/8 sized stores [PR105094]
On the following testcase on 64-bit targets, store-merging sees
a MEM_REF store from {} ctor with "negative" bitsize where bitoff + bitsize
wraps around to very small end offset. This later confuses the code
so that it allocates just a few bytes of memory but fills in huge amounts of
it. Later on there is a param_store_merging_max_size size check but due to
the wrap-around we pass that.
The following patch punts on such large bitsizes.
2022-03-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/105094
* gimple-ssa-store-merging.c (mem_valid_for_store_merging): Punt if
bitsize <= 0 rather than just == 0.
Jakub Jelinek [Wed, 30 Mar 2022 07:16:41 +0000 (09:16 +0200)]
c++: Fox template-introduction tentative parsing in class bodies clear colon_corrects_to_scope_p [PR105061]
The concepts support (in particular template introductions from concepts TS)
broke the following testcase, valid unnamed bitfields with dependent
types (or even just typedefs) were diagnosed as typos (: instead of correct
::) in template introduction during their tentative parsing.
The following patch fixes that by not doing this : to :: correction when
member_p is true.
2022-03-30 Jakub Jelinek <jakub@redhat.com>
PR c++/105061
* parser.c (cp_parser_template_introduction): If member_p, temporarily
clear parser->colon_corrects_to_scope_p around tentative parsing of
nested name specifier.
Jakub Jelinek [Sat, 26 Mar 2022 07:11:58 +0000 (08:11 +0100)]
c++: Fix up __builtin_convertvector parsing
Jonathan reported on IRC that we don't parse
__builtin_bit_cast (type, val).field
etc.
The problem is that for these 2 builtins we return from
cp_parser_postfix_expression instead of setting postfix_expression
to the cp_build_* value and falling through into the postfix regression
suffix handling loop.
2022-03-26 Jakub Jelinek <jakub@redhat.com>
* parser.c (cp_parser_postfix_expression)
<case RID_BILTIN_CONVERTVECTOR>: Don't
return cp_build_vec_convert result right away, instead
set postfix_expression to it and break.
* c-c++-common/builtin-convertvector-3.c: New test.
Jakub Jelinek [Thu, 24 Mar 2022 09:12:25 +0000 (10:12 +0100)]
c++: extern thread_local declarations in constexpr [PR104994]
C++14 to C++20 apparently should allow extern thread_local declarations in
constexpr functions, however useless they are there (because accessing
such vars is not valid in a constant expression, perhaps sizeof/decltype).
P2242 changed that for C++23 to passing through declaration but
https://cplusplus.github.io/CWG/issues/2552.html
has been filed for it yesterday.
Jakub Jelinek [Sat, 19 Mar 2022 12:53:12 +0000 (13:53 +0100)]
i386: Don't emit pushf;pop for __builtin_ia32_readeflags_u* with unused lhs [PR104971]
__builtin_ia32_readeflags_u* aren't marked const or pure I think
intentionally, so that they aren't CSEd from different regions of a function
etc. because we don't and can't easily track all dependencies between
it and surrounding code (if somebody looks at the condition flags, it is
dependent on the vast majority of instructions).
But the builtin itself doesn't have any side-effects, so if we ignore the
result of the builtin, there is no point to emit anything.
There is a LRA bug that miscompiles the testcase which this patch makes
latent, which is certainly worth fixing too, but IMHO this change
(and maybe ix86_gimple_fold_builtin too which would fold it even earlier
when it looses lhs) is worth it as well.
2022-03-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104971
* config/i386/i386-expand.c
(ix86_expand_builtin) <case IX86_BUILTIN_READ_FLAGS>: If ignore,
don't push/pop anything and just return const0_rtx.
Jakub Jelinek [Fri, 18 Mar 2022 17:49:23 +0000 (18:49 +0100)]
c++: Fix up constexpr evaluation of new with zero sized types [PR104568]
The new expression constant expression evaluation right now tries to
deduce how many elts the array it uses for the heap or heap [] vars
should have (or how many elts should its trailing array have if it has
cookie at the start). As new is lowered at that point to
(some_type *) ::operator new (size)
or so, it computes it by subtracting cookie size if any from size, then
divides the result by sizeof (some_type).
This works fine for most types, except when sizeof (some_type) is 0,
then we divide by zero; size is then equal to cookie_size (or if there
is no cookie, to 0).
The following patch special cases those cases so that we don't divide
by zero and also recover the original outer_nelts from the expression
by forcing the size not to be folded in that case but be explicit
0 * outer_nelts or cookie_size + 0 * outer_nelts.
Note, we have further issues, we accept-invalid various cases, for both
zero sized elt_type and even non-zero sized elts, we aren't able to
diagnose out of bounds POINTER_PLUS_EXPR like:
constexpr bool
foo ()
{
auto p = new int[2];
auto q1 = &p[0];
auto q2 = &p[1];
auto q3 = &p[2];
auto q4 = &p[3];
delete[] p;
return true;
}
constexpr bool a = foo ();
That doesn't look like a regression so I think we should resolve that for
GCC 13, but there are 2 problems. Figure out why
cxx_fold_pointer_plus_expression doesn't deal with the &heap []
etc. cases, and for the zero sized arrays, I think we really need to preserve
whether user wrote an array ref or pointer addition, because in the
&p[3] case if sizeof(p[0]) == 0 we know that if it has 2 elements it is
out of bounds, while if we see p p+ 0 the information if it was
p + 2 or p + 3 in the source is lost.
clang++ seems to handle it fine even in the zero sized cases or with
new expressions.
2022-03-18 Jakub Jelinek <jakub@redhat.com>
PR c++/104568
* init.c (build_new_constexpr_heap_type): Remove FULL_SIZE
argument and its handling, instead add ITYPE2 argument. Only
support COOKIE_SIZE != NULL.
(build_new_1): If size is 0, change it to 0 * outer_nelts if
outer_nelts is non-NULL. Pass type rather than elt_type to
maybe_wrap_new_for_constexpr.
* constexpr.c (build_new_constexpr_heap_type): New function.
(cxx_eval_constant_expression) <case CONVERT_EXPR>:
If elt_size is zero sized type, try to recover outer_nelts from
the size argument to operator new/new[] and pass that as
arg_size to build_new_constexpr_heap_type. Pass ctx,
non_constant_p and overflow_p to that call too.
Jakub Jelinek [Wed, 16 Mar 2022 10:04:16 +0000 (11:04 +0100)]
aarch64: Fix up RTL sharing bug in aarch64_load_symref_appropriately [PR104910]
We unshare all RTL created during expansion, but when
aarch64_load_symref_appropriately is called after expansion like in the
following testcases, we use imm in both HIGH and LO_SUM operands.
If imm is some RTL that shouldn't be shared like a non-sharable CONST,
we get at least with --enable-checking=rtl a checking ICE, otherwise might
just get silently wrong code.
The following patch fixes that by copying it if it can't be shared.
Jakub Jelinek [Tue, 15 Mar 2022 08:12:03 +0000 (09:12 +0100)]
ifcvt: Punt if not onlyjump_p for find_if_case_{1,2} [PR104814]
find_if_case_{1,2} implicitly assumes conditional jumps and rewrites them,
so if they have extra side-effects or are say asm goto, things don't work
well, either the side-effects are lost or we could ICE.
In particular, the testcase below on s390x has there a doloop instruction
that decrements a register in addition to testing it for non-zero and
conditionally jumping based on that.
The following patch fixes that by punting for !onlyjump_p case, i.e.
if there are side-effects in the jump instruction or it isn't a plain PC
setter.
Also, it assumes BB_END (test_bb) will be always non-NULL, because basic
blocks with 2 non-abnormal successor edges should always have some instruction
at the end that determines which edge to take.
2022-03-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104814
* ifcvt.c (find_if_case_1, find_if_case_2): Punt if test_bb doesn't
end with onlyjump_p. Assume BB_END (test_bb) is always non-NULL.
Jakub Jelinek [Wed, 9 Mar 2022 08:15:28 +0000 (09:15 +0100)]
c, c++, c-family: -Wshift-negative-value and -Wshift-overflow* tweaks for -fwrapv and C++20+ [PR104711]
As mentioned in the PR, different standards have different definition
on what is an UB left shift. They all agree on out of bounds (including
negative) shift count.
The rules used by ubsan are:
C99-C2x ((unsigned) x >> (uprecm1 - y)) != 0 then UB
C++11-C++17 x < 0 || ((unsigned) x >> (uprecm1 - y)) > 1 then UB
C++20 and later everything is well defined
Now, for C++20, I've in the P1236R1 implementation added an early
exit for -Wshift-overflow* warning so that it never warns, but apparently
-Wshift-negative-value remained as is. As it is well defined in C++20,
the following patch doesn't enable -Wshift-negative-value from -Wextra
anymore for C++20 and later, if users want for compatibility with C++17
and earlier get the warning, they still can by using -Wshift-negative-value
explicitly.
Another thing is -fwrapv, that is an extension to the standards, so it is up
to us how exactly we define that case. Our ubsan code treats
TYPE_OVERFLOW_WRAPS (type0) and cxx_dialect >= cxx20 the same as only
diagnosing out of bounds shift count and nothing else and IMHO it is most
sensical to treat -fwrapv signed left shifts the same as C++20 treats
them, https://eel.is/c++draft/expr.shift#2
"The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N,
where N is the width of the type of the result.
[Note 1: E1 is left-shifted E2 bit positions; vacated bits are zero-filled.
— end note]"
with no UB dependent on the E1 values. The UB is only
"The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand."
Under the hood (except for FEs and ubsan from FEs) GCC middle-end doesn't
consider UB in left shifts dependent on the first operand's value, only
the out of bounds shifts.
While this change isn't a regression, I'd think it is useful for GCC 12,
it doesn't add new warnings, but just removes warnings that aren't
appropriate.
2022-03-09 Jakub Jelinek <jakub@redhat.com>
PR c/104711
gcc/
* doc/invoke.texi (-Wextra): Document that -Wshift-negative-value
is enabled by it only for C++11 to C++17 rather than for C++03 or
later.
(-Wshift-negative-value): Similarly (except here we stated
that it is enabled for C++11 or later).
gcc/c-family/
* c-opts.c (c_common_post_options): Don't enable
-Wshift-negative-value from -Wextra for C++20 or later.
* c-ubsan.c (ubsan_instrument_shift): Adjust comments.
* c-warn.c (maybe_warn_shift_overflow): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
gcc/c/
* c-fold.c (c_fully_fold_internal): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
* c-typeck.c (build_binary_op): Likewise.
gcc/cp/
* constexpr.c (cxx_eval_check_shift_p): Use TYPE_OVERFLOW_WRAPS
instead of TYPE_UNSIGNED.
* typeck.c (cp_build_binary_op): Don't emit
-Wshift-negative-value warning if TYPE_OVERFLOW_WRAPS.
gcc/testsuite/
* c-c++-common/Wshift-negative-value-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-negative-value-2.c: Likewise.
* c-c++-common/Wshift-negative-value-3.c: Likewise.
* c-c++-common/Wshift-negative-value-4.c: Likewise.
* c-c++-common/Wshift-negative-value-7.c: New test.
* c-c++-common/Wshift-negative-value-8.c: New test.
* c-c++-common/Wshift-negative-value-9.c: New test.
* c-c++-common/Wshift-negative-value-10.c: New test.
* c-c++-common/Wshift-overflow-1.c: Remove
dg-additional-options, instead in target selectors of each diagnostic
check for exact C++ versions where it should be diagnosed.
* c-c++-common/Wshift-overflow-2.c: Likewise.
* c-c++-common/Wshift-overflow-5.c: Likewise.
* c-c++-common/Wshift-overflow-6.c: Likewise.
* c-c++-common/Wshift-overflow-7.c: Likewise.
* c-c++-common/Wshift-overflow-8.c: New test.
* c-c++-common/Wshift-overflow-9.c: New test.
* c-c++-common/Wshift-overflow-10.c: New test.
* c-c++-common/Wshift-overflow-11.c: New test.
* c-c++-common/Wshift-overflow-12.c: New test.
Jakub Jelinek [Tue, 8 Mar 2022 20:41:21 +0000 (21:41 +0100)]
c++: Don't suggest cdtor or conversion op identifiers in spelling hints [PR104806]
On the following testcase, we emit "did you mean '__dt '?" in the error
message. "__dt " shows there because it is dtor_identifier, but we
shouldn't suggest those to the user, they are purely internal and can't
be really typed by the user because of the final space in it.
2022-03-08 Jakub Jelinek <jakub@redhat.com>
PR c++/104806
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Ignore
identifiers with space at the end.
Jakub Jelinek [Mon, 7 Mar 2022 10:14:04 +0000 (11:14 +0100)]
s390: Fix up *cmp_and_trap_unsigned_int<mode> constraints [PR104775]
The following testcase fails to assemble due to clgte %r6,0(%r1,%r10)
insn not being accepted by assembler.
My rough understanding is that in the RSY-b insn format the spot
in other formats used for index registers is used instead for M3 what
kind of comparison it is, so this patch follows what other similar
instructions use for constraint (i.e. one without index register).
2022-03-07 Jakub Jelinek <jakub@redhat.com>
PR target/104775
* config/s390/s390.md (*cmp_and_trap_unsigned_int<mode>): Use
S constraint instead of T in the last alternative.
Jakub Jelinek [Fri, 25 Feb 2022 20:25:12 +0000 (21:25 +0100)]
match.pd: Further complex simplification fixes [PR104675]
Mark mentioned in the PR further 2 simplifications that also ICE
with complex types.
For these, eventually (but IMO GCC 13 materials) we could support it
for vector types if it would be uniform vector constants.
Currently integer_pow2p is true only for INTEGER_CSTs and COMPLEX_CSTs
and we can't use bit_and etc. for complex type.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
Marc Glisse <marc.glisse@inria.fr>
PR tree-optimization/104675
* match.pd (t * 2U / 2 -> t & (~0 / 2), t / 2U * 2 -> t & ~1):
Restrict simplifications to INTEGRAL_TYPE_P.
Jakub Jelinek [Fri, 25 Feb 2022 17:58:48 +0000 (18:58 +0100)]
rs6000: Use rs6000_emit_move in movmisalign<mode> expander [PR104681]
The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.
Jakub Jelinek [Fri, 25 Feb 2022 09:55:17 +0000 (10:55 +0100)]
match.pd: Don't create BIT_NOT_EXPRs for COMPLEX_TYPE [PR104675]
We don't support BIT_{AND,IOR,XOR,NOT}_EXPR on complex types,
&/|/^ are just rejected for them, and ~ is parsed as CONJ_EXPR.
So, we should avoid simplifications which turn valid complex type
expressions into something that will ICE during expansion.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104675
* match.pd (-A - 1 -> ~A, -1 - A -> ~A): Don't simplify for
COMPLEX_TYPE.
* gcc.dg/pr104675-1.c: New test.
* gcc.dg/pr104675-2.c: New test.
Jakub Jelinek [Tue, 22 Feb 2022 10:32:08 +0000 (11:32 +0100)]
libiberty: Fix up debug.temp.o creation if *.o has 64K+ sections [PR104617]
On
#define A(n) int foo1##n(void) { return 1##n; }
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can
contain any section index.
Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc. I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.
2022-02-22 Jakub Jelinek <jakub@redhat.com>
PR lto/104617
* simple-object-elf.c (simple_object_elf_match): Fix up URL
in comment.
(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
range (inclusive).
<bb 9> [local count: 1073741824]:
<retval>.x = 0;
The problem is during expansion, <retval> isn't marked TREE_ADDRESSABLE,
even when we take its address in (unsigned long) &<retval>.x.
Now, instrument_derefs has code to avoid the instrumentation altogether
if we can prove the access is within bounds of an automatic variable in the
current function and the var isn't TREE_ADDRESSABLE (or we don't instrument
use after scope), but we do it solely for VAR_DECLs.
I think we should treat RESULT_DECLs exactly like that too, which is what
the following patch does. I must say I'm unsure about PARM_DECLs, those can
have different cases, either they are fully or partially passed in
registers, then if we take parameter's address, they are in a local copy
inside of a function and so work like those automatic vars. But if they
are fully passed in memory, we typically just take address of the slot
and in that case they live in the caller's frame. It is true we don't
(can't) put any asan padding in between the arguments, so all asan could
detect in that case is if caller passes fewer on stack arguments or smaller
arguments than callee accepts. Anyway, as I'm unsure, I haven't added
PARM_DECLs to that case.
And another thing is, when we actually build_fold_addr_expr, we need to
mark_addressable the inner if it isn't addressable already.
2022-02-19 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/102656
* asan.c (instrument_derefs): If inner is a RESULT_DECL and access is
known to be within bounds, treat it like automatic variables.
If instrumenting access and inner is {VAR,PARM,RESULT}_DECL from
current function and !TREE_STATIC which is not TREE_ADDRESSABLE, mark
it addressable.
Jakub Jelinek [Thu, 17 Feb 2022 10:14:38 +0000 (11:14 +0100)]
valtrack: Avoid creating raw SUBREGs with VOIDmode argument [PR104557]
After the recent r12-7240 simplify_immed_subreg changes, we bail on more
simplify_subreg calls than before, e.g. apparently for decimal modes
in the NaN representations we almost never preserve anything except the
canonical {q,s}NaNs.
simplify_gen_subreg will punt in such cases because a SUBREG with VOIDmode
is not valid, but debug_lowpart_subreg wants to attempt even harder, even
if e.g. target indicates certain mode combinations aren't valid for the
backend, dwarf2out can still handle them. But a SUBREG from a VOIDmode
operand is just too much, the inner mode is lost there. We'd need some
new rtx that would be able to represent those cases.
For now, just punt in those cases.
2022-02-17 Jakub Jelinek <jakub@redhat.com>
PR debug/104557
* valtrack.c (debug_lowpart_subreg): Don't call gen_rtx_raw_SUBREG
if expr has VOIDmode.
but the code that searches forward for insns to update their log
links (before the change there is a link from insn 10033 to insn 10016
for pseudo 111) only finds insn 10033 and updates the log link if
-g isn't enabled, otherwise it stops earlier because there are debug insns
in between. So, with -g LOG_LINKS of 10033 isn't updated, points eventually
to NOTE_INSN_DELETED and so we do not attempt to combine 10033 with other
insns, while with -g0 we do.
The following patch fixes that by instead ignoring debug insns during the
searching. We can still check BLOCK_FOR_INSN (insn) on those, because
if we notice DEBUG_INSN in a following basic block, necessarily there won't
be any further normal insns in the current block after it.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/104544
* combine.c (try_combine): When looking for insn whose links
should be updated from i3 to i2, don't stop on debug insns, instead
skip over them.
Jakub Jelinek [Wed, 16 Feb 2022 08:25:55 +0000 (09:25 +0100)]
c-family: Fix up shorten_compare for decimal vs. non-decimal float comparison [PR104510]
The comment in shorten_compare says:
/* If either arg is decimal float and the other is float, fail. */
but the callers of shorten_compare don't expect anything like failure
as a possibility from the function, callers require that the function
promotes the operands to the same type, whether the original selected
*restype_ptr one or some shortened.
So, if we choose not to shorten, we should still promote to the original
*restype_ptr.
2022-02-16 Jakub Jelinek <jakub@redhat.com>
PR c/104510
* c-common.c (shorten_compare): Convert original arguments to
the original *restype_ptr when mixing binary and decimal float.
Jakub Jelinek [Tue, 15 Feb 2022 10:18:56 +0000 (11:18 +0100)]
sanitizer: Use glibc _thread_db_sizeof_pthread symbol if present
I've cherry-picked following fix from llvm-project. Recent glibcs
have _thread_db_sizeof_pthread symbol variable which contains the
size of struct pthread, so that sanitizers don't need to guess that
and risk that it will change again.
Jakub Jelinek [Tue, 15 Feb 2022 09:22:30 +0000 (10:22 +0100)]
openmp: Make finalize_task_copyfn order reproduceable [PR104517]
The following testcase fails -fcompare-debug, because finalize_task_copyfn
was invoked from splay tree destruction, whose order can in some cases
depend on -g/-g0. The fix is to queue the task stmts that need copyfn
in a vector and run finalize_task_copyfn on elements of that vector.
2022-02-15 Jakub Jelinek <jakub@redhat.com>
PR debug/104517
* omp-low.c (task_cpyfns): New variable.
(delete_omp_context): Don't call finalize_task_copyfn from here.
(create_task_copyfn): Push task_stmt into task_cpyfns.
(execute_lower_omp): Call finalize_task_copyfn here on entries from
task_cpyfns vector and release the vector.