Martin Jambor [Fri, 9 Feb 2024 17:58:43 +0000 (18:58 +0100)]
sra: Disqualify bases of operands of asm gotos
PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto. We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements. Because we can't
have that after ASM gotos, we need to punt.
gcc/ChangeLog:
2024-01-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/110422
* tree-sra.c (scan_function): Disqualify bases of operands of asm
gotos.
gcc/testsuite/ChangeLog:
2024-01-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
Xi Ruoyao [Fri, 2 Feb 2024 19:35:07 +0000 (03:35 +0800)]
MIPS: Fix wrong MSA FP vector negation
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.
Use the bnegi.df instructions to simply reverse the sign bit instead.
gcc/ChangeLog:
* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.
2024-02-01 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.
Lewis Hyatt [Tue, 5 Dec 2023 16:33:39 +0000 (11:33 -0500)]
c-family: Fix ICE with large column number after restoring a PCH [PR105608]
Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.
This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.
gcc/c-family/ChangeLog:
PR preprocessor/105608
* c-pch.c (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.
gcc/testsuite/ChangeLog:
PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
Jason Merrill [Tue, 19 Dec 2023 21:12:02 +0000 (16:12 -0500)]
c++: xvalue array subscript [PR103185]
Normally we handle xvalue array subscripting with ARRAY_REF, but in this
case we weren't doing that because the operands were reversed. Handle that
case better.
Jan Hubicka [Thu, 22 Dec 2022 09:55:46 +0000 (10:55 +0100)]
Zen4 tuning part 2
Adds tunes needed for zen4 microarchitecture. I added two new knobs.
TARGET_AVX512_SPLIT_REGS which is used to specify that internally 512 vectors
are split to 256 vectors. This affects vectorization costs and reassociation
width. It probably should also affect RTX costs however I doubt it is very useful
since RTL optimizers are usually not judging between 256 and 512 vectors.
I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this
flag may not be a win except for very specific benchmarks. I am still doing some
more detailed testing here.
Oherwise I disabled gathers on zen4 for 2 parts nad 4 parts. We can open code them
and since the latencies has only increased since zen3 opencoding is better than
actual instrucction. This shows at 4 tsvc benchmarks.
I ended up setting AVX256_OPTIMAL. This is a compromise. There are some tsvc
benchmarks that increase noticeably (up to 250%) however there are also few
regressions. Most of these can be solved by incrasing vec_perm cost in the
vectorizer. However this does not cure about 14% regression on x264 that is
quite important. Here we produce vectorized loops for avx512 that probably
would be faster if the loops in question had high enough iteration count.
We hit this problem with avx256 too: since the loop iterates few times, only
prologues/epilogues are used. Adding another round of prologue/epilogue
code does not make it better.
Finally I enabled avx stores for constnat sized memcpy and memset. I am not
sure why this is an opt-in feature. I think for most hardware this is a win.
gcc/ChangeLog:
2022-12-22 Jan Hubicka <hubicka@ucw.cz>
* config/i386/i386-expand.c (ix86_expand_set_or_cpymem): Add
TARGET_AVX512_SPLIT_REGS
* config/i386/i386-options.c (ix86_option_override_internal):
Honor x86_TONE_AVOID_256FMA_CHAINS.
* config/i386/i386.c (ix86_vec_cost): Honor TARGET_AVX512_SPLIT_REGS.
(ix86_reassociation_width): Likewise.
* config/i386/i386.h (TARGET_AVX512_SPLIT_REGS): New tune.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable
for znver4.
(X86_TUNE_USE_GATHER_4PARTS): Likewise.
(X86_TUNE_AVOID_256FMA_CHAINS): Set for znver4.
(X86_TUNE_AVOID_512FMA_CHAINS): New utne; set for znver4.
(X86_TUNE_AVX256_OPTIMAL): Add znver4.
(X86_TUNE_AVX512_SPLIT_REGS): New tune.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX256_STORE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX512_MOVE_BY_PIECES): Add znver4.
(X86_TUNE_AVX512_STORE_BY_PIECES): Add znver4.
Jan Hubicka [Thu, 22 Dec 2022 01:16:24 +0000 (02:16 +0100)]
Update znver4 costs
Update cost of znver4 mostly based on data measued by Agner Fog.
Compared to previous generations x87 became bit slower which is probably not
big deal (and we have minimal benchmarking coverage for it). One interesting
improvement is reducation of FMA cost. I also updated costs of AVX256
loads/stores based on latencies (not throughput which is twice of avx256).
Overall AVX512 vectorization seems to improve noticeably some of TSVC
benchmarks but since internally 512 vectors are split to 256 vectors it is
somewhat risky and does not win in SPEC scores (mostly by regressing benchmarks
with loop that have small trip count like x264 and exchange), so for now I am
going to set AVX256_OPTIMAL tune but I am still playing with it. We improved
since ZNVER1 on choosing vectorization size and also have vectorized
prologues/epilogues so it may be possible to make avx512 small win overall.
2022-12-22 Jan Hubicka <hubicka@ucw.cz>
* config/i386/x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE
moves, division multiplication, gathers, L2 cache size, and more
complex FP instrutions.
Ken Matsui [Thu, 11 Jan 2024 06:08:07 +0000 (22:08 -0800)]
libstdc++: Fix error handling in filesystem::equivalent [PR113250]
This patch made std::filesystem::equivalent correctly throw an exception
when either path does not exist as per [fs.op.equivalent]/4.
PR libstdc++/113250
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (fs::equivalent): Use || instead of &&.
* src/filesystem/ops.cc (fs::equivalent): Likewise.
* testsuite/27_io/filesystem/operations/equivalent.cc: Handle
error codes.
* testsuite/experimental/filesystem/operations/equivalent.cc:
Likewise.
Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit df147e2ee7199d33d66959c6509ce9c21072077f)
Patrick Palka [Wed, 3 Jan 2024 02:31:20 +0000 (21:31 -0500)]
libstdc++: testsuite: Reduce max_size_type.cc exec time [PR113175]
The adjustment to max_size_type.cc in r14-205-g83470a5cd4c3d2
inadvertently increased the execution time of this test by over 5x due
to making the two main loops actually run in the signed_p case instead
of being dead code.
To compensate, this patch cuts the relevant loops' range [-1000,1000] by
10x as proposed in the PR. This shouldn't significantly weaken the test
since the same important edge cases are still checked in the smaller range
and/or elsewhere. On my machine this reduces the test's execution time by
roughly 10x (and 1.6x relative to before r14-205).
PR testsuite/113175
libstdc++-v3/ChangeLog:
* testsuite/std/ranges/iota/max_size_type.cc (test02): Reduce
'limit' to 100 from 1000 and adjust 'log2_limit' accordingly.
(test03): Likewise.
Patrick Palka [Fri, 22 Sep 2023 10:25:49 +0000 (06:25 -0400)]
c++: constraint rewriting during ttp coercion [PR111485]
In order to compare the constraints of a ttp with that of its argument,
we rewrite the ttp's constraints in terms of the argument template's
template parameters. The substitution to achieve this currently uses a
single level of template arguments, but that never does the right thing
because a ttp's template parameters always have level >= 2. This patch
fixes this by including the outer template arguments in the substitution,
which ought to match the depth of the ttp.
The second testcase demonstrates it's better to substitute the concrete
outer template arguments instead of generic ones since a ttp's constraints
could depend on outer parameters.
PR c++/111485
gcc/cp/ChangeLog:
* pt.c (is_compatible_template_arg): New parameter 'args'.
Add the outer template arguments 'args' to 'new_args'.
(convert_template_argument): Pass 'args' to
is_compatible_template_arg.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-ttp5.C: New test.
* g++.dg/cpp2a/concepts-ttp6.C: New test.
Jason Merrill [Fri, 23 Apr 2021 20:41:35 +0000 (16:41 -0400)]
c++: -Wdeprecated-copy and using operator= [PR92145]
For the purpose of [depr.impldec] "if the class has a user-declared copy
assignment operator", an operator= brought in from a base class with 'using'
may be a copy-assignment operator, but it isn't a copy-assignment operator
for the derived class.
gcc/cp/ChangeLog:
PR c++/92145
* class.c (classtype_has_depr_implicit_copy): Check DECL_CONTEXT
of operator=.
gcc/testsuite/ChangeLog:
PR c++/92145
* g++.dg/cpp0x/depr-copy3.C: New test.
Jason Merrill [Sun, 4 Jun 2023 16:00:55 +0000 (12:00 -0400)]
c++: NRV and goto [PR92407]
Here our named return value optimization was breaking the required
destructor when the goto takes 'a' out of scope. A simple fix for the
release branches is to disable the optimization in the presence of backward
goto.
We could do better by disabling the optimization only if there is a backward
goto across the variable declaration, but we don't track that, and in GCC 14
we instead make the goto work with NRV.
PR c++/92407
gcc/cp/ChangeLog:
* cp-tree.h (struct language_function): Add backward_goto.
* decl.c (check_goto): Set it.
* typeck.c (check_return_expr): Prevent NRV if set.
Patrick Palka [Tue, 25 Apr 2023 19:59:22 +0000 (15:59 -0400)]
c++: value dependence of by-ref lambda capture [PR108975]
We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses. Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.
We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.
To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time. It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P. This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.
PR c++/108975
gcc/cp/ChangeLog:
* pt.c (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.
Jakub Jelinek [Tue, 19 Dec 2023 09:24:33 +0000 (10:24 +0100)]
i386: Fix mmx.md signbit expanders [PR112816]
Apparently when looking for "signbit<mode>2" vector expanders, I've only
looked at sse.md and forgot mmx.md, which has another one and the
following patch still ICEd.
2023-12-19 Jakub Jelinek <jakub@redhat.com>
PR target/112816
* config/i386/mmx.md (signbitv2sf2): Force operands[1] into a REG.
Jakub Jelinek [Fri, 8 Dec 2023 19:56:48 +0000 (20:56 +0100)]
c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]
The following testcase is miscompiled because two ubsan instrumentations
run into each other.
The first one is the shift instrumentation. Before the C++ FE calls
it, it wraps the 2 shift arguments with cp_save_expr, so that side-effects
in them aren't evaluated multiple times. And, ubsan_instrument_shift
itself uses unshare_expr on any uses of the operands to make sure further
modifications in them don't affect other copies of them (the only not
unshared ones are the one the caller then uses for the actual operation
after the instrumentation, which means there is no tree sharing).
Now, if there are side-effects in the first operand like say function
call, cp_save_expr wraps it into a SAVE_EXPR, and ubsan_instrument_shift
in this mode emits something like
if (..., SAVE_EXPR <foo ()>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <foo ()>, ...);
and caller adds
SAVE_EXPR <foo ()> << SAVE_EXPR <op1>
after it in a COMPOUND_EXPR. So far so good.
If there are no side-effects and cp_save_expr doesn't create SAVE_EXPR,
everything is ok as well because of the unshare_expr.
We have
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., ptr->something[i], ...);
and
ptr->something[i] << SAVE_EXPR <op1>
where ptr->something[i] is unshared.
In the testcase below, the !x->s[j] ? 1 : 0 expression is wrapped initially
into a SAVE_EXPR though, and unshare_expr doesn't unshare SAVE_EXPRs nor
anything used in them for obvious reasons, so we end up with:
if (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, ...);
and
SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0> << SAVE_EXPR <op1>
So far good as well. But later during cp_fold of the SAVE_EXPR we find
out that VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1 is actually
invariant (has TREE_READONLY set) and so cp_fold simplifies the above to
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., (bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1, ...);
and
((bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1) << SAVE_EXPR <op1>
with the s[j] ARRAY_REFs and other expressions shared in between the two
uses (and obviously the expression optimized away from the COMPOUND_EXPR in
the if condition.
Then comes another ubsan instrumentation at genericization time,
this time to instrument the ARRAY_REFs with strict bounds checking,
and replaces the s[j] in there with s[.UBSAN_BOUNDS (0B, SAVE_EXPR<j>, 8), SAVE_EXPR<j>]
As the trees are shared, it does that just once though.
And as the if body is gimplified first, the SAVE_EXPR<j> is evaluated inside
of the if body and when it is used again after the if, it uses a potentially
uninitialized value of j.1 (always uninitialized if the shift count isn't
out of bounds).
The following patch fixes that by unshare_expr unsharing the folded argument
of a SAVE_EXPR if we've folded the SAVE_EXPR into an invariant and it is
used more than once.
2023-12-08 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/112727
* cp-gimplify.c (cp_fold): If SAVE_EXPR has been previously
folded, unshare_expr what is returned.
Jakub Jelinek [Wed, 29 Nov 2023 11:26:50 +0000 (12:26 +0100)]
fold-const: Fix up multiple_of_p [PR112733]
We ICE on the following testcase when wi::multiple_of_p is called on
widest_int 1 and -128 with UNSIGNED. I still need to work on the
actual wide-int.cc issue, the latest patch attached to the PR regressed
bitint-{38,39}.c, so will need to debug that, but there is a clear bug
on the fold-const.cc side as well - widest_int is a signed representation
by definition, using UNSIGNED with it certainly doesn't match what was
intended, because -128 as the second operand effectively means unsigned
131072 bit 0xfffff............ffff80 integer, not the signed char -128
that appeared in the source.
In the INTEGER_CST case a few lines above this we already use
case INTEGER_CST:
if (TREE_CODE (bottom) != INTEGER_CST || integer_zerop (bottom))
return false;
return wi::multiple_of_p (wi::to_widest (top), wi::to_widest (bottom),
SIGNED);
so I think using SIGNED with widest_int is best there (compared to the
other choices in the PR).
2023-11-29 Jakub Jelinek <jakub@redhat.com>
PR middle-end/112733
* fold-const.c (multiple_of_p): Pass SIGNED rather than
UNSIGNED for wi::multiple_of_p on widest_int arguments.
Jakub Jelinek [Tue, 5 Dec 2023 12:17:57 +0000 (13:17 +0100)]
i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]
The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.
The following patch fixes that. As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).
2023-12-05 Jakub Jelinek <jakub@redhat.com>
PR target/112845
* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
if the new immediate is ix86_endbr_immediate_operand.
Jakub Jelinek [Mon, 4 Dec 2023 08:01:09 +0000 (09:01 +0100)]
i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]
The following testcase ICEs with RTL checking, because it sets if
XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
is actually an UNSPEC, so any time we see any other insn with PARALLEL
and a SET in it which is not an UNSPEC we ICE during RTL checking or
access there some other union member as if it was an rt_int.
The rest is just small cleanup.
2023-12-04 Jakub Jelinek <jakub@redhat.com>
PR target/112837
* config/i386/i386.c (ix86_elim_entry_set_got): Before checking
for UNSPEC_SET_GOT check that SET_SRC is UNSPEC. Use SET_SRC and
SET_DEST macros instead of XEXP, rename vec variable to set.
Jakub Jelinek [Mon, 4 Dec 2023 08:00:18 +0000 (09:00 +0100)]
i386: Fix up signbit<mode>2 expander [PR112816]
The following testcase ICEs, because the signbit<mode>2 expander uses an
explicit SUBREG in the pattern around match_operand with register_operand
predicate. If we are unlucky enough that expansion tries to expand it
with some SUBREG as operands[1], we have two nested SUBREGs in the IL,
which is not valid and causes ICE later.
2023-12-04 Jakub Jelinek <jakub@redhat.com>
PR target/112816
* config/i386/sse.md (signbit<mode>2): Force operands[1] into a REG.
Jakub Jelinek [Mon, 4 Dec 2023 07:59:15 +0000 (08:59 +0100)]
c++: #pragma GCC unroll C++ fixes [PR112795]
foo in the unroll-5.C testcase ICEs because cp_parser_pragma_unroll
during parsing calls maybe_constant_value unconditionally, which is
fine if !processing_template_decl, but can ICE otherwise.
While just calling fold_non_dependent_expr there instead could be enough
to fix the ICE (and I guess the right thing to do for backports if any),
I don't see a reason why we couldn't handle a dependent #pragma GCC unroll
argument as well, the unrolling isn't done in the FE and all the middle-end
cares about is that ANNOTATE_EXPR has a 1..65534 last operand when it is
annot_expr_unroll_kind.
So, the following patch changes all the unsigned short unroll arguments
to tree unroll (and thus avoids the tree -> unsigned short -> tree
conversions), does the type and value checking during parsing only if
the argument isn't dependent and repeats it during instantiation.
2023-12-04 Jakub Jelinek <jakub@redhat.com>
PR c++/112795
gcc/cp/
* parser.c (cp_parser_pragma_unroll): Use fold_non_dependent_expr
instead of maybe_constant_value.
gcc/testsuite/
* g++.dg/ext/unroll-5.C: New test.
Jakub Jelinek [Sat, 25 Nov 2023 09:31:55 +0000 (10:31 +0100)]
i386: Fix up *jcc_bt*_mask{,_1} [PR111408]
The following testcase is miscompiled in GCC 14 because the
*jcc_bt<mode>_mask and *jcc_bt<SWI48:mode>_mask_1 patterns have just
one argument in (match_operator 0 "bt_comparison_operator" [...])
but as bt_comparison_operator is eq,ne, we need two.
The md readers don't warn about it, after all, some checks can
be done in the predicate rather than specified explicitly, and the
behavior is that anything is accepted as the second argument.
I went through all other i386.md match_operator uses and all others
looked right (extract_operator using 3 operands, all others 2).
I think we'll want to fix this at different spots in older releases
because I think the bug was introduced already in 2008, though most
likely just latent.
2023-11-25 Jakub Jelinek <jakub@redhat.com>
PR target/111408
* config/i386/i386.md (*jcc_bt<mode>_mask): Add (const_int 0) as
expected second operand of bt_comparison_operator.
Jakub Jelinek [Fri, 13 Oct 2023 07:09:32 +0000 (09:09 +0200)]
libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc
The following testcase started FAILing recently after the
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
glibc change which marked vfscanf with nonnull (1) attribute.
While vfwscanf hasn't been marked similarly (strangely), the patch changes
that too. By using va_arg one hides the value of it from the compiler
(volatile keyword would do too, or making the FILE* stream a function
argument, but then it might need to be guarded by #if or something).
2023-10-13 Jakub Jelinek <jakub@redhat.com>
* testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
Initialize stream to va_arg(ap, FILE*) rather than 0.
* testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
Likewise.
Jakub Jelinek [Wed, 19 Jul 2023 11:48:53 +0000 (13:48 +0200)]
wide-int: Fix up wi::divmod_internal [PR110731]
As the following testcase shows, wi::divmod_internal doesn't handle
correctly signed division with precision > 64 when the dividend (and likely
divisor as well) is the type's minimum and the precision isn't divisible
by 64.
A few lines above what the patch hunk changes is:
/* Make the divisor and dividend positive and remember what we
did. */
if (sgn == SIGNED)
{
if (wi::neg_p (dividend))
{
neg_dividend = -dividend;
dividend = neg_dividend;
dividend_neg = true;
}
if (wi::neg_p (divisor))
{
neg_divisor = -divisor;
divisor = neg_divisor;
divisor_neg = true;
}
}
i.e. we negate negative dividend or divisor and remember those.
But, after we do that, when unpacking those values into b_dividend and
b_divisor we need to always treat the wide_ints as UNSIGNED,
because divmod_internal_2 performs an unsigned division only.
Now, if precision <= 64, we don't reach here at all, earlier code
handles it. If dividend or divisor aren't the most negative values,
the negation clears their most significant bit, so it doesn't really
matter if we unpack SIGNED or UNSIGNED. And if precision is multiple
of HOST_BITS_PER_WIDE_INT, there is no difference in behavior, while
-0x80000000000000000000000000000000 negates to
-0x80000000000000000000000000000000 the unpacking of it as SIGNED
or UNSIGNED works the same.
In the testcase, we have signed precision 119 and the dividend is
val = { 0, 0xffc0000000000000 }, len = 2, precision = 119
both before and after negation.
Divisor is
val = { 2 }, len = 1, precision = 119
But we really want to divide 0x400000000000000000000000000000 by 2
unsigned and then negate at the end.
If it is unsigned precision 119 division
0x400000000000000000000000000000 by 2
dividend is
val = { 0, 0xffc0000000000000 }, len = 2, precision = 119
but as we unpack it UNSIGNED, it is unpacked into
0, 0, 0, 0x00400000
The following patch fixes it by always using UNSIGNED unpacking
because we've already negated negative values at that point if
sgn == SIGNED and so most negative constants should be treated as
positive.
2023-07-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110731
* wide-int.cc (wi::divmod_internal): Always unpack dividend and
divisor as UNSIGNED regardless of sgn.
The following removes a misguided attempt to allow x + x in a reduction
path, also allowing x * x which isn't valid. x + x actually never
arrives this way but instead is canonicalized to 2 * x. This makes
reduction path handling consistent with how we handle the single-stmt
reduction case.
PR tree-optimization/111764
* tree-vect-loop.c (check_reduction_path): Remove the attempt
to allow x + x via special-casing of assigns.
Richard Biener [Mon, 16 Oct 2023 10:50:46 +0000 (12:50 +0200)]
middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile
The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.
The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles. The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.
PR middle-end/111818
* tree-ssa.c (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.
Richard Biener [Mon, 19 Jun 2023 07:52:45 +0000 (09:52 +0200)]
tree-optimization/110298 - CFG cleanup and stale nb_iterations
When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names. But we do this only after cleaning
up the CFG which in turn can end up accessing it. Fixed by
swapping the two.
PR tree-optimization/110298
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.
Richard Biener [Mon, 19 Jun 2023 07:23:16 +0000 (09:23 +0200)]
debug/110295 - mixed up early/late debug for member DIEs
When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members. Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.
The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.
PR debug/110295
* dwarf2out.c (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.
Richard Biener [Fri, 9 Jun 2023 07:29:09 +0000 (09:29 +0200)]
middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code
When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.
* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.
liuhongt [Thu, 7 Dec 2023 01:17:27 +0000 (09:17 +0800)]
Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.
If the function desn't clobber any sse registers or only clobber
128-bit part, then vzeroupper isn't issued before the function exit.
the status not CLEAN but ANY after the function.
Also for sibling_call, it's safe to issue an vzeroupper. Also there
could be missing vzeroupper since there's no mode_exit for
sibling_call_p.
gcc/ChangeLog:
PR target/112891
* config/i386/i386.c (ix86_avx_u128_mode_after): Return
AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
align with ix86_avx_u128_mode_needed.
(ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
sibling_call.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr112891.c: New test.
* gcc.target/i386/pr112891-2.c: New test.