Martin Jambor [Fri, 2 Feb 2024 12:27:50 +0000 (13:27 +0100)]
sra: Disqualify bases of operands of asm gotos
PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto. We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements. Because we can't
have that after ASM gotos, we need to punt.
gcc/ChangeLog:
2024-01-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/110422
* tree-sra.cc (scan_function): Disqualify bases of operands of asm
gotos.
gcc/testsuite/ChangeLog:
2024-01-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
Jonathan Wakely [Thu, 1 Feb 2024 18:37:34 +0000 (18:37 +0000)]
libstdc++: Force-inline shared_ptr::operator bool() for C++20 [PR108636]
This avoids a linker error with -fkeep-inline-functions when including
<filesystem>. We can't backport the fix from trunk because it adds an
export to the shared library. By marking the "missing" symbol
always_inline for C++20 mode we don't need a definition in the library.
libstdc++-v3/ChangeLog:
PR libstdc++/108636
* include/bits/shared_ptr_base.h (__shared_ptr::operator bool):
Add always_inline attribute for C++20 and later.
The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.
2024-02-01 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.
Lewis Hyatt [Tue, 5 Dec 2023 16:33:39 +0000 (11:33 -0500)]
c-family: Fix ICE with large column number after restoring a PCH [PR105608]
Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.
This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.
gcc/c-family/ChangeLog:
PR preprocessor/105608
* c-pch.cc (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.
gcc/testsuite/ChangeLog:
PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
Jason Merrill [Tue, 23 Jan 2024 20:41:09 +0000 (15:41 -0500)]
c++: throwing cleanup after return [PR113347]
Here we were assuming that current_retval_sentinel would be set if we have
seen a throwing cleanup, but that's not the case if the cleanup is after all
returns.
This change isn't needed on trunk, where current_retval_sentinel is set for
all NRV functions.
Jason Merrill [Tue, 19 Dec 2023 21:12:02 +0000 (16:12 -0500)]
c++: xvalue array subscript [PR103185]
Normally we handle xvalue array subscripting with ARRAY_REF, but in this
case we weren't doing that because the operands were reversed. Handle that
case better.
Jason Merrill [Wed, 20 Dec 2023 16:06:27 +0000 (11:06 -0500)]
c++: throwing dtor and empty try [PR113088]
maybe_splice_retval_cleanup assumed that the function body can't be empty if
there's a throwing cleanup, but when I added cleanups to try blocks in r12-6333-gb10e031458d541 I didn't adjust that assumption.
PR c++/113088
PR c++/33799
gcc/cp/ChangeLog:
* except.cc (maybe_splice_retval_cleanup): Handle an empty block.
Jason Merrill [Sun, 4 Jun 2023 16:00:55 +0000 (12:00 -0400)]
c++: NRV and goto [PR92407]
Here our named return value optimization was breaking the required
destructor when the goto takes 'a' out of scope. A simple fix for the
release branches is to disable the optimization in the presence of backward
goto.
We could do better by disabling the optimization only if there is a backward
goto across the variable declaration, but we don't track that, and in GCC 14
we instead make the goto work with NRV.
PR c++/92407
gcc/cp/ChangeLog:
* cp-tree.h (struct language_function): Add backward_goto.
* decl.cc (check_goto): Set it.
* typeck.cc (check_return_expr): Prevent NRV if set.
Georg-Johann Lay [Mon, 15 Jan 2024 12:25:59 +0000 (13:25 +0100)]
AVR: target/107201: Make -nodevicelib work for all devices.
driver-avr.cc contains a spec that discriminates between cores
and devices by means of a mmcu=avr* spec pattern. This does not
work for new devices like AVR128* which also start with mmcu=avr
like all cores do. The patch uses a new spec function in order to
tell apart cores from devices.
gcc/
PR target/107201
* config/avr/avr.h (EXTRA_SPEC_FUNCTIONS): Add no-devlib, avr_no_devlib.
* config/avr/driver-avr.cc (avr_no_devlib): New function.
(avr_devicespecs_file): Use it to remove -nodevicelib from the
options for cores only.
* config/avr/avr-arch.h (avr_get_parch): New prototype.
* config/avr/avr-devices.cc (avr_get_parch): New function.
The get_target_expr call added in r12-7069-g119cea98f66476 causes us
for the below testcase to call build_vec_delete in a template context,
which builds a templated destructor call and checks expr_noexcept_p for
it, which ICEs because the call has templated form.
Much of the work of build_vec_delete however is code generation and thus
will just get discarded in a template context, and that includes the
code guarded by expr_noexcept_p. So this patch narrowly fixes this ICE
by eliding the expr_noexcept_p call when in a template context.
PR c++/109899
gcc/cp/ChangeLog:
* init.cc (build_vec_delete_1): Assume expr_noexcept_p returns
false in a template context.
Andrew Pinski [Mon, 15 Jan 2024 09:31:36 +0000 (10:31 +0100)]
AVR: target/113156 - Fix ICE due to missing "Save" on -m[long-]double= options.
Multilib options -mdouble= and -mlong-double= are not orthogonal:
TARGET_HANDLE_OPTION = avr-common.cc::avr_handle_option() sets them
such that sizeof(double) <= sizeof(long double) is always true.
Sandra Loosemore [Thu, 11 Jan 2024 21:12:56 +0000 (21:12 +0000)]
libgcc, nios2: Fix exception handling on nios2 with -fpic
Exception handling on nios2-linux-gnu with -fpic has been broken since
revision 790854ea7670f11c14d431c102a49181d2915965, "Use _dl_find_object
in _Unwind_Find_FDE". For whatever reason, this doesn't work on nios2.
Nios2 uses the GOT address as the base for DW_EH_PE_datarel
relocations in PIC; see my previous fix to make this work, revision 2d33dcfe9f0494c9b56a8d704c3d27c5a4329ebc, "Support for GOT-relative
DW_EH_PE_datarel encoding". So this may be a horrible bug in the ABI
or in my interpretation of it or just glibc's implementation of
_dl_find_object for this target, but there's existing code out there
that does things this way; and realistically, nobody is going to
re-engineer this now that the vendor has EOL'ed the nios2
architecture. So, just skip over the code trying to use
_dl_find_object on this target and fall back to the way that works.
libgcc/ChangeLog
* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Do not try to use
_dl_find_object on nios2; it doesn't work.
The current constexpr implementation of std::char_traits<C>::move relies
on being able to compare the pointer parameters, which is not allowed
for unrelated pointers. We can use __builtin_constant_p to determine
whether it's safe to compare the pointers directly. If not, then we know
the ranges must be disjoint and so we can use char_traits<C>::copy to
copy forwards from the first character to the last. If the pointers can
be compared directly, then we can simplify the condition for copying
backwards to just two pointer comparisons.
libstdc++-v3/ChangeLog:
PR libstdc++/113200
* include/bits/char_traits.h (__gnu_cxx::char_traits::move): Use
__builtin_constant_p to check for unrelated pointers that cannot
be compared during constant evaluation.
* testsuite/21_strings/char_traits/requirements/113200.cc: New
test.
Ken Matsui [Thu, 11 Jan 2024 06:08:07 +0000 (22:08 -0800)]
libstdc++: Fix error handling in filesystem::equivalent [PR113250]
This patch made std::filesystem::equivalent correctly throw an exception
when either path does not exist as per [fs.op.equivalent]/4.
PR libstdc++/113250
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (fs::equivalent): Use || instead of &&.
* src/filesystem/ops.cc (fs::equivalent): Likewise.
* testsuite/27_io/filesystem/operations/equivalent.cc: Handle
error codes.
* testsuite/experimental/filesystem/operations/equivalent.cc:
Likewise.
Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit df147e2ee7199d33d66959c6509ce9c21072077f)
Steve Baird [Wed, 16 Nov 2022 17:28:22 +0000 (09:28 -0800)]
Fix nternal compiler error for Sequential Partition_Elaboration_Policy
In some cases, compilation of a function with a limited class-wide result
type could fail with an ICE if a Sequential Partition_Elaboration_Policy is
specified. To prevent this, we really want that specifying a Sequential
Partition_Elaboration_Policy to have the side effect of imposing a
No_Task_Hierarchy restriction. But doing that in a straightforward
way leads to problems with incorrectly accepting violations of H.6(6).
So a new restriction, No_Task_Hierarchy_Implicit, is introduced.
gcc/ada/
PR ada/104354
* libgnat/s-rident.ads: Define a new restriction,
No_Task_Hierarchy_Implicit. This is like the No_Task_Hierarchy
restriction, but with the difference that setting this restriction
does not mean the H.6(6) post-compilation check is satisified.
* exp_ch6.adb (Add_Task_Actuals_To_Build_In_Place_Call): If it is
known that the function result cannot have tasks, then pass in a
null literal for the activation chain actual parameter. This
avoids generating a reference to an entity that
Build_Activation_Chain_Entity may have chosen not to generate a
declaration for.
* gnatbind.adb (List_Applicable_Restrictions): Do not list the
No_Task_Hierarchy_Implicit restriction.
* restrict.adb: Special treatment for the
No_Task_Hierarchy_Implicit restriction in functions
Get_Restriction_Id and Restriction_Active. The former is needed to
disallow the (unlikely) case that a user tries to explicitly
reference the No_Task_Hierarchy_Implicit restriction.
* sem_prag.adb (Analyze_Pragma): If a Sequential
Partition_Elaboration_Policy is specified (and the
No_Task_Hierarchy restriction is not already enabled), then enable
the No_Task_Hierarchy_Implicit restriction.
AVR: PR target/112952: Fix attribute "address", "io" and "io_low"
so they work with all combinations of -f[no-]data-sections -f[no-]common.
The patch also improves some diagnostics and adds additional checks, for
example these attributes must only be applied to variables in static storage.
gcc/
PR target/112952
* config/avr/avr.cc (avr_handle_addr_attribute): Also print valid
range when diagnosing attribute "io" and "io_low" are out of range.
(avr_eval_addr_attrib): Don't ICE on empty address at that place.
(avr_insert_attributes): Reject if attribute "address", "io" or "io_low"
in contexts other than static storage.
(avr_asm_output_aligned_decl_common): Move output of decls with
attribute "address", "io", and "io_low" to...
(avr_output_addr_attrib): ...this new function.
(avr_asm_asm_output_aligned_bss): Remove output for decls with
attribute "address", "io", and "io_low".
(avr_encode_section_info): Rectify handling of decls with attribute
"address", "io", and "io_low".
gcc/testsuite/
PR target/112952
* gcc.target/avr/attribute-io.h: New file.
* gcc.target/avr/pr112952-0.c: New test.
* gcc.target/avr/pr112952-1.c: New test.
* gcc.target/avr/pr112952-2.c: New test.
* gcc.target/avr/pr112952-3.c: New test.
Patrick Palka [Wed, 3 Jan 2024 02:31:20 +0000 (21:31 -0500)]
libstdc++: testsuite: Reduce max_size_type.cc exec time [PR113175]
The adjustment to max_size_type.cc in r14-205-g83470a5cd4c3d2
inadvertently increased the execution time of this test by over 5x due
to making the two main loops actually run in the signed_p case instead
of being dead code.
To compensate, this patch cuts the relevant loops' range [-1000,1000] by
10x as proposed in the PR. This shouldn't significantly weaken the test
since the same important edge cases are still checked in the smaller range
and/or elsewhere. On my machine this reduces the test's execution time by
roughly 10x (and 1.6x relative to before r14-205).
PR testsuite/113175
libstdc++-v3/ChangeLog:
* testsuite/std/ranges/iota/max_size_type.cc (test02): Reduce
'limit' to 100 from 1000 and adjust 'log2_limit' accordingly.
(test03): Likewise.
Patrick Palka [Fri, 22 Sep 2023 10:25:49 +0000 (06:25 -0400)]
c++: constraint rewriting during ttp coercion [PR111485]
In order to compare the constraints of a ttp with that of its argument,
we rewrite the ttp's constraints in terms of the argument template's
template parameters. The substitution to achieve this currently uses a
single level of template arguments, but that never does the right thing
because a ttp's template parameters always have level >= 2. This patch
fixes this by including the outer template arguments in the substitution,
which ought to match the depth of the ttp.
The second testcase demonstrates it's better to substitute the concrete
outer template arguments instead of generic ones since a ttp's constraints
could depend on outer parameters.
PR c++/111485
gcc/cp/ChangeLog:
* pt.cc (is_compatible_template_arg): New parameter 'args'.
Add the outer template arguments 'args' to 'new_args'.
(convert_template_argument): Pass 'args' to
is_compatible_template_arg.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-ttp5.C: New test.
* g++.dg/cpp2a/concepts-ttp6.C: New test.
Patrick Palka [Tue, 25 Apr 2023 19:59:22 +0000 (15:59 -0400)]
c++: value dependence of by-ref lambda capture [PR108975]
We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses. Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.
We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.
To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time. It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P. This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.
PR c++/108975
gcc/cp/ChangeLog:
* pt.cc (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.
Jakub Jelinek [Tue, 19 Dec 2023 09:24:33 +0000 (10:24 +0100)]
i386: Fix mmx.md signbit expanders [PR112816]
Apparently when looking for "signbit<mode>2" vector expanders, I've only
looked at sse.md and forgot mmx.md, which has another one and the
following patch still ICEd.
2023-12-19 Jakub Jelinek <jakub@redhat.com>
PR target/112816
* config/i386/mmx.md (signbitv2sf2): Force operands[1] into a REG.
The following testcase ICEs because we aren't careful enough with
alloc_size attribute. We do check that such an argument exists
(although wouldn't handle correctly functions with more than INT_MAX
arguments), but didn't check that it is scalar integer, the ICE is
trying to fold_convert a structure to sizetype.
Given that the attribute can also appear on non-prototyped functions
where the arguments aren't known, I don't see how the FE could diagnose
that and because we already handle the case where argument doesn't exist,
I think we should also verify the argument is scalar integer convertible
to sizetype. Furthermore, given this is not just in diagnostics but
used for code generation, I think it is better to punt on arguments with
larger precision then sizetype, the upper bits are then truncated.
The patch also fixes some formatting issues and avoids duplication of the
fold_convert, plus removes unnecessary check for if (arg1 >= 0), that is
always the case after if (arg1 < 0) return ...;
2023-12-18 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113013
* tree-object-size.cc (alloc_object_size): Return size_unknown if
corresponding argument(s) don't have integral type or have integral
type with higher precision than sizetype. Don't check arg1 >= 0
uselessly. Compare argument indexes against gimple_call_num_args
in unsigned type rather than int. Formatting fixes.
Jakub Jelinek [Fri, 8 Dec 2023 19:56:48 +0000 (20:56 +0100)]
c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]
The following testcase is miscompiled because two ubsan instrumentations
run into each other.
The first one is the shift instrumentation. Before the C++ FE calls
it, it wraps the 2 shift arguments with cp_save_expr, so that side-effects
in them aren't evaluated multiple times. And, ubsan_instrument_shift
itself uses unshare_expr on any uses of the operands to make sure further
modifications in them don't affect other copies of them (the only not
unshared ones are the one the caller then uses for the actual operation
after the instrumentation, which means there is no tree sharing).
Now, if there are side-effects in the first operand like say function
call, cp_save_expr wraps it into a SAVE_EXPR, and ubsan_instrument_shift
in this mode emits something like
if (..., SAVE_EXPR <foo ()>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <foo ()>, ...);
and caller adds
SAVE_EXPR <foo ()> << SAVE_EXPR <op1>
after it in a COMPOUND_EXPR. So far so good.
If there are no side-effects and cp_save_expr doesn't create SAVE_EXPR,
everything is ok as well because of the unshare_expr.
We have
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., ptr->something[i], ...);
and
ptr->something[i] << SAVE_EXPR <op1>
where ptr->something[i] is unshared.
In the testcase below, the !x->s[j] ? 1 : 0 expression is wrapped initially
into a SAVE_EXPR though, and unshare_expr doesn't unshare SAVE_EXPRs nor
anything used in them for obvious reasons, so we end up with:
if (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, ...);
and
SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0> << SAVE_EXPR <op1>
So far good as well. But later during cp_fold of the SAVE_EXPR we find
out that VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1 is actually
invariant (has TREE_READONLY set) and so cp_fold simplifies the above to
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., (bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1, ...);
and
((bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1) << SAVE_EXPR <op1>
with the s[j] ARRAY_REFs and other expressions shared in between the two
uses (and obviously the expression optimized away from the COMPOUND_EXPR in
the if condition.
Then comes another ubsan instrumentation at genericization time,
this time to instrument the ARRAY_REFs with strict bounds checking,
and replaces the s[j] in there with s[.UBSAN_BOUNDS (0B, SAVE_EXPR<j>, 8), SAVE_EXPR<j>]
As the trees are shared, it does that just once though.
And as the if body is gimplified first, the SAVE_EXPR<j> is evaluated inside
of the if body and when it is used again after the if, it uses a potentially
uninitialized value of j.1 (always uninitialized if the shift count isn't
out of bounds).
The following patch fixes that by unshare_expr unsharing the folded argument
of a SAVE_EXPR if we've folded the SAVE_EXPR into an invariant and it is
used more than once.
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112727
* cp-gimplify.cc (cp_fold): If SAVE_EXPR has been previously
folded, unshare_expr what is returned.
Jakub Jelinek [Wed, 29 Nov 2023 11:26:50 +0000 (12:26 +0100)]
fold-const: Fix up multiple_of_p [PR112733]
We ICE on the following testcase when wi::multiple_of_p is called on
widest_int 1 and -128 with UNSIGNED. I still need to work on the
actual wide-int.cc issue, the latest patch attached to the PR regressed
bitint-{38,39}.c, so will need to debug that, but there is a clear bug
on the fold-const.cc side as well - widest_int is a signed representation
by definition, using UNSIGNED with it certainly doesn't match what was
intended, because -128 as the second operand effectively means unsigned
131072 bit 0xfffff............ffff80 integer, not the signed char -128
that appeared in the source.
In the INTEGER_CST case a few lines above this we already use
case INTEGER_CST:
if (TREE_CODE (bottom) != INTEGER_CST || integer_zerop (bottom))
return false;
return wi::multiple_of_p (wi::to_widest (top), wi::to_widest (bottom),
SIGNED);
so I think using SIGNED with widest_int is best there (compared to the
other choices in the PR).
2023-11-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112733
* fold-const.cc (multiple_of_p): Pass SIGNED rather than
UNSIGNED for wi::multiple_of_p on widest_int arguments.
Jakub Jelinek [Tue, 5 Dec 2023 12:17:57 +0000 (13:17 +0100)]
i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]
The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.
The following patch fixes that. As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).
2023-12-05 Jakub Jelinek <jakub@redhat.com>
PR target/112845
* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
if the new immediate is ix86_endbr_immediate_operand.
Jakub Jelinek [Mon, 4 Dec 2023 08:01:09 +0000 (09:01 +0100)]
i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]
The following testcase ICEs with RTL checking, because it sets if
XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
is actually an UNSPEC, so any time we see any other insn with PARALLEL
and a SET in it which is not an UNSPEC we ICE during RTL checking or
access there some other union member as if it was an rt_int.
The rest is just small cleanup.
2023-12-04 Jakub Jelinek <jakub@redhat.com>
PR target/112837
* config/i386/i386.cc (ix86_elim_entry_set_got): Before checking
for UNSPEC_SET_GOT check that SET_SRC is UNSPEC. Use SET_SRC and
SET_DEST macros instead of XEXP, rename vec variable to set.
Jakub Jelinek [Mon, 4 Dec 2023 08:00:18 +0000 (09:00 +0100)]
i386: Fix up signbit<mode>2 expander [PR112816]
The following testcase ICEs, because the signbit<mode>2 expander uses an
explicit SUBREG in the pattern around match_operand with register_operand
predicate. If we are unlucky enough that expansion tries to expand it
with some SUBREG as operands[1], we have two nested SUBREGs in the IL,
which is not valid and causes ICE later.
2023-12-04 Jakub Jelinek <jakub@redhat.com>
PR target/112816
* config/i386/sse.md (signbit<mode>2): Force operands[1] into a REG.
Jakub Jelinek [Mon, 4 Dec 2023 07:59:15 +0000 (08:59 +0100)]
c++: #pragma GCC unroll C++ fixes [PR112795]
foo in the unroll-5.C testcase ICEs because cp_parser_pragma_unroll
during parsing calls maybe_constant_value unconditionally, which is
fine if !processing_template_decl, but can ICE otherwise.
While just calling fold_non_dependent_expr there instead could be enough
to fix the ICE (and I guess the right thing to do for backports if any),
I don't see a reason why we couldn't handle a dependent #pragma GCC unroll
argument as well, the unrolling isn't done in the FE and all the middle-end
cares about is that ANNOTATE_EXPR has a 1..65534 last operand when it is
annot_expr_unroll_kind.
So, the following patch changes all the unsigned short unroll arguments
to tree unroll (and thus avoids the tree -> unsigned short -> tree
conversions), does the type and value checking during parsing only if
the argument isn't dependent and repeats it during instantiation.
2023-12-04 Jakub Jelinek <jakub@redhat.com>
PR c++/112795
gcc/cp/
* parser.cc (cp_parser_pragma_unroll): Use fold_non_dependent_expr
instead of maybe_constant_value.
gcc/testsuite/
* g++.dg/ext/unroll-5.C: New test.
Jakub Jelinek [Sat, 25 Nov 2023 09:31:55 +0000 (10:31 +0100)]
i386: Fix up *jcc_bt*_mask{,_1} [PR111408]
The following testcase is miscompiled in GCC 14 because the
*jcc_bt<mode>_mask and *jcc_bt<SWI48:mode>_mask_1 patterns have just
one argument in (match_operator 0 "bt_comparison_operator" [...])
but as bt_comparison_operator is eq,ne, we need two.
The md readers don't warn about it, after all, some checks can
be done in the predicate rather than specified explicitly, and the
behavior is that anything is accepted as the second argument.
I went through all other i386.md match_operator uses and all others
looked right (extract_operator using 3 operands, all others 2).
I think we'll want to fix this at different spots in older releases
because I think the bug was introduced already in 2008, though most
likely just latent.
2023-11-25 Jakub Jelinek <jakub@redhat.com>
PR target/111408
* config/i386/i386.md (*jcc_bt<mode>_mask): Add (const_int 0) as
expected second operand of bt_comparison_operator.
Jakub Jelinek [Mon, 13 Nov 2023 07:47:41 +0000 (08:47 +0100)]
gimple-range-cache: Fix ICEs when dumping details [PR111967]
The following testcase ICEs when dumping details.
When m_ssa_ranges vector is created, it is safe_grow_cleared (num_ssa_names),
but when when some new SSA_NAME is added, we strangely grow it to
num_ssa_names + 1 instead and later on the 3 argument dump method
iterates from 1 to m_ssa_ranges.length () - 1 and uses ssa_name (x)
on each; but because set_bb_range grew it one too much, ssa_name
(m_ssa_ranges.length () - 1) might be after the end of the ssanames
vector and ICE.
The fix grows the vector consistently only to num_ssa_names,
doesn't waste time checking m_ssa_ranges[0] because there is no
ssa_names (0), it is always NULL, before using ssa_name (x) checks
if we'll need it at all (we check later if m_ssa_ranges[x] is non-NULL,
so we might check it earlier as well) and also in the last loop
iterates until m_ssa_ranges.length () rather than num_ssa_names, I don't
see a reason for the inconsistency and in theory some SSA_NAME could be
added without set_bb_range called for it and the vector could be shorter
than the ssanames vector.
To actually fix the ICE, either the first hunk or the last 2 hunks
would be enough, but I think it doesn't hurt to change all the spots.
2023-11-13 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111967
* gimple-range-cache.cc (block_range_cache::set_bb_range): Grow
m_ssa_ranges to num_ssa_names rather than num_ssa_names + 1.
(block_range_cache::dump): Iterate from 1 rather than 0. Don't use
ssa_name (x) unless m_ssa_ranges[x] is non-NULL. Iterate to
m_ssa_ranges.length () rather than num_ssa_names.
Jakub Jelinek [Thu, 9 Nov 2023 08:05:54 +0000 (09:05 +0100)]
attribs: Fix ICE with -Wno-attributes= [PR112339]
The following testcase ICEs, because with -Wno-attributes=foo::no_sanitize
(but generally any other non-gnu namespace and some gnu well known attribute
name within that other namespace) the FEs don't really parse attribute
arguments of such attribute, but lookup_attribute_spec is non-NULL with
NULL handler and such attributes are added to DECL_ATTRIBUTES or
TYPE_ATTRIBUTES and then when e.g. middle-end does lookup_attribute
on a particular attribute and expects the attribute to mean something
and/or have a particular verified arguments, it can crash when seeing
the foreign attribute in there instead.
The following patch fixes that by never adding ignored attributes
to DECL_ATTRIBUTES/TYPE_ATTRIBUTES, previously that was the case just
for attributes in ignored namespace (where lookup_attribute_space
returned NULL). We don't really know anything about those attributes,
so shouldn't pretend we know something about them, especially when
the arguments are error_mark_node or NULL instead of something that
would have been parsed. And it would be really weird if we normally
ignore say [[clang::unused]] attribute, but when people use
-Wno-attributes=clang::unused we actually treated it as gnu::unused.
All the user asked for is suppress warnings about that attribute being
unknown.
The first hunk is just playing safe, I'm worried people could
-Wno-attributes=gnu::
and get various crashes with known GNU attributes not being actually
parsed and recorded (or worse e.g. when we tweak standard attributes
into GNU attributes and we wouldn't add those).
The -Wno-attributes= documentation says that it suppresses warning about
unknown attributes, so I think -Wno-attributes=gnu:: should prevent
warning about say [[gnu::foobarbaz]] attribute, but not about
[[gnu::unused]] because the latter is a known attribute.
The routine would return true for any scoped attribute in the ignored
namespace, with the change it ignores only unknown attributes in ignored
namespace, known ones in there will be ignored only if they have
max_length of -2 (e.g.. with
-Wno-attributes=gnu:: -Wno-attributes=gnu::foobarbaz).
2023-11-09 Jakub Jelinek <jakub@redhat.com>
PR c/112339
* attribs.cc (attribute_ignored_p): Only return true for
attr_namespace_ignored_p if as is NULL.
(decl_attributes): Never add ignored attributes.
* c-c++-common/ubsan/Wno-attributes-1.c: New test.
Jakub Jelinek [Fri, 13 Oct 2023 07:09:32 +0000 (09:09 +0200)]
libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc
The following testcase started FAILing recently after the
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
glibc change which marked vfscanf with nonnull (1) attribute.
While vfwscanf hasn't been marked similarly (strangely), the patch changes
that too. By using va_arg one hides the value of it from the compiler
(volatile keyword would do too, or making the FILE* stream a function
argument, but then it might need to be guarded by #if or something).
2023-10-13 Jakub Jelinek <jakub@redhat.com>
* testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
Initialize stream to va_arg(ap, FILE*) rather than 0.
* testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
Likewise.
Jakub Jelinek [Wed, 19 Jul 2023 11:48:53 +0000 (13:48 +0200)]
wide-int: Fix up wi::divmod_internal [PR110731]
As the following testcase shows, wi::divmod_internal doesn't handle
correctly signed division with precision > 64 when the dividend (and likely
divisor as well) is the type's minimum and the precision isn't divisible
by 64.
A few lines above what the patch hunk changes is:
/* Make the divisor and dividend positive and remember what we
did. */
if (sgn == SIGNED)
{
if (wi::neg_p (dividend))
{
neg_dividend = -dividend;
dividend = neg_dividend;
dividend_neg = true;
}
if (wi::neg_p (divisor))
{
neg_divisor = -divisor;
divisor = neg_divisor;
divisor_neg = true;
}
}
i.e. we negate negative dividend or divisor and remember those.
But, after we do that, when unpacking those values into b_dividend and
b_divisor we need to always treat the wide_ints as UNSIGNED,
because divmod_internal_2 performs an unsigned division only.
Now, if precision <= 64, we don't reach here at all, earlier code
handles it. If dividend or divisor aren't the most negative values,
the negation clears their most significant bit, so it doesn't really
matter if we unpack SIGNED or UNSIGNED. And if precision is multiple
of HOST_BITS_PER_WIDE_INT, there is no difference in behavior, while
-0x80000000000000000000000000000000 negates to
-0x80000000000000000000000000000000 the unpacking of it as SIGNED
or UNSIGNED works the same.
In the testcase, we have signed precision 119 and the dividend is
val = { 0, 0xffc0000000000000 }, len = 2, precision = 119
both before and after negation.
Divisor is
val = { 2 }, len = 1, precision = 119
But we really want to divide 0x400000000000000000000000000000 by 2
unsigned and then negate at the end.
If it is unsigned precision 119 division
0x400000000000000000000000000000 by 2
dividend is
val = { 0, 0xffc0000000000000 }, len = 2, precision = 119
but as we unpack it UNSIGNED, it is unpacked into
0, 0, 0, 0x00400000
The following patch fixes it by always using UNSIGNED unpacking
because we've already negated negative values at that point if
sgn == SIGNED and so most negative constants should be treated as
positive.
2023-07-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110731
* wide-int.cc (wi::divmod_internal): Always unpack dividend and
divisor as UNSIGNED regardless of sgn.
Richard Biener [Mon, 21 Aug 2023 08:34:30 +0000 (10:34 +0200)]
debug/111080 - avoid outputting debug info for unused restrict qualified type
The following applies some maintainance with respect to type qualifiers
and kinds added by later DWARF standards to prune_unused_types_walk.
The particular case in the bug is not handling (thus marking required)
all restrict qualified type DIEs. I've found more DW_TAG_*_type that
are unhandled, looked up the DWARF docs and added them as well based
on common sense.
PR debug/111080
* dwarf2out.cc (prune_unused_types_walk): Handle
DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type,
DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type
and DW_TAG_dynamic_type as to only output them when referenced.
Richard Biener [Fri, 25 Aug 2023 11:37:30 +0000 (13:37 +0200)]
tree-optimization/111137 - dependence checking for SLP
The following fixes a mistake with SLP dependence checking. When
checking whether we can hoist loads to the first load place we
special-case stores of the same instance considering them sunk
to the last store place. But we fail to consider that stores from
other SLP instances are sunk in a similar way. This leads us to
miss the dependence between (A) and (B) in
where the zeroing stores are sunk to (A') and the loads hoisted
to (B'). The following fixes this, treating grouped stores from
other instances similar to stores from our own instance. The
difference is - and this is more conservative than necessary - that
we don't know which stores of a group are in which SLP instance
(though I believe either all of the grouped stores will be in
a single SLP instance or in none at the moment), so we don't
know which stores are sunk where. We simply assume they are
all sunk to the last store we run into. Likewise we do not take
into account that an SLP instance might be cancelled (or a grouped
store not actually belong to any instance).
PR tree-optimization/111137
* tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
Properly handle grouped stores from other SLP instances.
Richard Biener [Fri, 25 Aug 2023 09:43:36 +0000 (11:43 +0200)]
Apply some TLC to vect_slp_analyze_instance_dependence
This refactors things, separating load and store handing, adjusting
comments to reflect reality and removing some dead code.
* tree-vect-data-refs.cc (vect_slp_analyze_store_dependences):
Split out from vect_slp_analyze_node_dependences, remove
dead code.
(vect_slp_analyze_load_dependences): Split out from
vect_slp_analyze_node_dependences, adjust comments. Process
queued stores before any disambiguation.
(vect_slp_analyze_node_dependences): Remove.
(vect_slp_analyze_instance_dependence): Adjust.
The following keeps dumping SSA def stmt RHS during diagnostic
reporting only for gimple_assign_single_p defs which means
memory loads. This avoids diagnostics containing PHI nodes
like
warning: 'realloc' called on pointer '*_42 = PHI <lcs.14_40(29), lcs.19_48(30)>.t_mem_caches' with nonzero offset 40
instead getting back the previous behavior:
warning: 'realloc' called on pointer '*<unknown>.t_mem_caches' with nonzero offset 40
liuhongt [Thu, 7 Dec 2023 01:17:27 +0000 (09:17 +0800)]
Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.
If the function desn't clobber any sse registers or only clobber
128-bit part, then vzeroupper isn't issued before the function exit.
the status not CLEAN but ANY after the function.
Also for sibling_call, it's safe to issue an vzeroupper. Also there
could be missing vzeroupper since there's no mode_exit for
sibling_call_p.
gcc/ChangeLog:
PR target/112891
* config/i386/i386.cc (ix86_avx_u128_mode_after): Return
AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
align with ix86_avx_u128_mode_needed.
(ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
sibling_call.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112891.c: New test.
* gcc.target/i386/pr112891-2.c: New test.