git.ipfire.org Git - thirdparty/gcc.git/log

sra: Disqualify bases of operands of asm gotos

PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto.  We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements.  Because we can't
have that after ASM gotos, we need to punt.

gcc/ChangeLog:

2024-01-17  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/110422
* tree-sra.c (scan_function): Disqualify bases of operands of asm
gotos.

gcc/testsuite/ChangeLog:

2024-01-17  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.

(cherry picked from commit 2b7204c52392c1c0da9c91a5feae0c44018a6f37)

Daily bump.

mips: Fix missing mode in neg<mode:MSA>2

I was too sleepy writting this :(.

gcc/ChangeLog:

* config/mips/mips-msa.md (neg<mode:MSA>2): Add missing mode for
neg.

(cherry picked from commit 55357960fbddd261e32f088f5dd328d58b0f25b3)

MIPS: Fix wrong MSA FP vector negation

We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.

Use the bnegi.df instructions to simply reverse the sign bit instead.

gcc/ChangeLog:

* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.

(cherry picked from commit 4d7fe3cf82505b45719356a2e51b1480b5ee21d6)

Daily bump.

hppa: Fix bug in atomic_storedi_1 pattern

The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.

2024-02-01 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.

Daily bump.

c-family: Fix ICE with large column number after restoring a PCH [PR105608]

Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.

This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).

This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.

gcc/c-family/ChangeLog:

PR preprocessor/105608
* c-pch.c (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.

gcc/testsuite/ChangeLog:

PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.

Daily bump.

AArch64: Add -mcpu=cobalt-100

Add support for -mcpu=cobalt-100 (Neoverse N2 with a different implementer ID).

gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add 'cobalt-100' CPU.
* config/aarch64/aarch64-tune.md: Regenerated.
* doc/invoke.texi (-mcpu): Add cobalt-100 core.

(cherry picked from commit a0d16e1c06e04c11d1eef9705036bad8ac1a11d4)

Daily bump.

c++: xvalue array subscript [PR103185]

Normally we handle xvalue array subscripting with ARRAY_REF, but in this
case we weren't doing that because the operands were reversed. Handle that
case better.

PR c++/103185

gcc/cp/ChangeLog:

* typeck.c (cp_build_array_ref): Handle swapped operands.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/array-prvalue2.C: New test.
* g++.dg/cpp1z/eval-order3.C: Test swapped operands.

(cherry picked from commit 8dfc52a75d4d6c8be1c61b4aa831b1812b14a10e)

Daily bump.

FIx handling of X86_TUNE_AVOID_512FMA_CHAINS

gcc/ChangeLog:

* config/i386/i386-options.c (ix86_option_override_internal):
Fix handling of X86_TUNE_AVOID_512FMA_CHAINS

Zen4 tuning part 2

Adds tunes needed for zen4 microarchitecture.  I added two new knobs.
TARGET_AVX512_SPLIT_REGS which is used to specify that internally 512 vectors
are split to 256 vectors.  This affects vectorization costs and reassociation
width. It probably should also affect RTX costs however I doubt it is very useful
since RTL optimizers are usually not judging between 256 and 512 vectors.

I also added X86_TUNE_AVOID_256FMA_CHAINS. Since fma has improved in zen4 this
flag may not be a win except for very specific benchmarks. I am still doing some
more detailed testing here.

Oherwise I disabled gathers on zen4 for 2 parts nad 4 parts. We can open code them
and since the latencies has only increased since zen3 opencoding is better than
actual instrucction.  This shows at 4 tsvc benchmarks.

I ended up setting AVX256_OPTIMAL. This is a compromise.  There are some tsvc
benchmarks that increase noticeably (up to 250%) however there are also few
regressions.  Most of these can be solved by incrasing vec_perm cost in the
vectorizer.  However this does not cure about 14% regression on x264 that is
quite important.  Here we produce vectorized loops for avx512 that probably
would be faster if the loops in question had high enough iteration count.
We hit this problem with avx256 too: since the loop iterates few times, only
prologues/epilogues are used.  Adding another round of prologue/epilogue
code does not make it better.

Finally I enabled avx stores for constnat sized memcpy and memset.  I am not
sure why this is an opt-in feature.  I think for most hardware this is a win.

gcc/ChangeLog:

2022-12-22  Jan Hubicka  <hubicka@ucw.cz>

* config/i386/i386-expand.c (ix86_expand_set_or_cpymem): Add
TARGET_AVX512_SPLIT_REGS
* config/i386/i386-options.c (ix86_option_override_internal):
Honor x86_TONE_AVOID_256FMA_CHAINS.
* config/i386/i386.c (ix86_vec_cost): Honor TARGET_AVX512_SPLIT_REGS.
(ix86_reassociation_width): Likewise.
* config/i386/i386.h (TARGET_AVX512_SPLIT_REGS): New tune.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable
for znver4.
(X86_TUNE_USE_GATHER_4PARTS): Likewise.
(X86_TUNE_AVOID_256FMA_CHAINS): Set for znver4.
(X86_TUNE_AVOID_512FMA_CHAINS): New utne; set for znver4.
(X86_TUNE_AVX256_OPTIMAL): Add znver4.
(X86_TUNE_AVX512_SPLIT_REGS): New tune.
(X86_TUNE_AVX256_MOVE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX256_STORE_BY_PIECES): Add znver1-3.
(X86_TUNE_AVX512_MOVE_BY_PIECES): Add znver4.
(X86_TUNE_AVX512_STORE_BY_PIECES): Add znver4.

(cherry picked from commit eef81eefcdc2a58111e50eb2162ea1f5becc8004)

Add AMD znver4 instruction reservations

This adds znver4 automata units and reservations separately from other
znver automata, avoiding the insn-automata.cc size blow-up.

gcc/ChangeLog:

* common/config/i386/i386-common.c (processor_alias_table):
Use CPU_ZNVER4 for znver4.
* config/i386/i386.md: Add znver4.md.
* config/i386/znver4.md: New.

(cherry picked from commit 72ce780a497eb3e5efe7a79ea5f21f8dd6858f7f)

Remove znver4 instruction reservations

This reverts the changes made to znver.md in:
commit bf3b532b524ecacb3202ab2c8af419ffaaab7cff

2022-10-21 Tejas Joshi <TejasSanjay.Joshi@amd.com>

gcc/ChangeLog:

* common/config/i386/i386-common.c (processor_alias_table): Use
CPU_ZNVER3 for znver4.
* config/i386/znver.md: Remove znver4 reservations.

(cherry picked from commit d93171509aa7ca23148508b96f1c1f70b941d808)

Enable AMD znver4 support and add instruction reservations

2022-09-28 Tejas Joshi <TejasSanjay.Joshi@amd.com>

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_amd_cpu): Recognize znver4.
* common/config/i386/i386-common.c (processor_names): Add znver4.
(processor_alias_table): Add znver4 and modularize old znvers.
* common/config/i386/i386-cpuinfo.h (processor_subtypes):
AMDFAM19H_ZNVER4.
* config.gcc (x86_64-*-* |...): Likewise.
* config/i386/driver-i386.c (host_detect_local_cpu): Let
-march=native recognize znver4 cpus.
* config/i386/i386-c.c (ix86_target_macros_internal): Add znver4.
* config/i386/i386-options.c (m_ZNVER4): New definition.
(m_ZNVER): Include m_ZNVER4.
(processor_cost_table): Add znver4.
* config/i386/i386.c (ix86_reassociation_width): Likewise.
* config/i386/i386.h (processor_type): Add PROCESSOR_ZNVER4.
(PTA_ZNVER1): New definition.
(PTA_ZNVER2): Likewise.
(PTA_ZNVER3): Likewise.
(PTA_ZNVER4): Likewise.
* config/i386/i386.md (define_attr "cpu"): Add znver4 and rename
md file.
* config/i386/x86-tune-sched.c (ix86_issue_rate): Add znver4.
(ix86_adjust_cost): Likewise.
* config/i386/znver1.md: Rename to znver.md.
* config/i386/znver.md: Add new reservations for znver4.
* doc/extend.texi: Add details about znver4.
* doc/invoke.texi: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv29.C: Likewise.

(cherry picked from commit bf3b532b524ecacb3202ab2c8af419ffaaab7cff)

Update znver4 costs

Update cost of znver4 mostly based on data measued by Agner Fog.
Compared to previous generations x87 became bit slower which is probably not
big deal (and we have minimal benchmarking coverage for it).  One interesting
improvement is reducation of FMA cost.  I also updated costs of AVX256
loads/stores  based on latencies (not throughput which is twice of avx256).
Overall AVX512 vectorization seems to improve noticeably some of TSVC
benchmarks but since internally 512 vectors are split to 256 vectors it is
somewhat risky and does not win in SPEC scores (mostly by regressing benchmarks
with loop that have small trip count like x264 and exchange), so for now I am
going to set AVX256_OPTIMAL tune but I am still playing with it.  We improved
since ZNVER1 on choosing vectorization size and also have vectorized
prologues/epilogues so it may be possible to make avx512 small win overall.

2022-12-22  Jan Hubicka  <hubicka@ucw.cz>

* config/i386/x86-tune-costs.h (znver4_cost): Upate costs of FP and SSE
moves, division multiplication, gathers, L2 cache size, and more
complex FP instrutions.

(cherry picked from commit bbe04bade0cc3b17e62c2af3d89b899367e7d2d1)

Daily bump.

libstdc++: Fix error handling in filesystem::equivalent [PR113250]

This patch made std::filesystem::equivalent correctly throw an exception
when either path does not exist as per [fs.op.equivalent]/4.

PR libstdc++/113250

libstdc++-v3/ChangeLog:

* src/c++17/fs_ops.cc (fs::equivalent): Use || instead of &&.
* src/filesystem/ops.cc (fs::equivalent): Likewise.
* testsuite/27_io/filesystem/operations/equivalent.cc: Handle
error codes.
* testsuite/experimental/filesystem/operations/equivalent.cc:
Likewise.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit df147e2ee7199d33d66959c6509ce9c21072077f)

Daily bump.

libstdc++: testsuite: Reduce max_size_type.cc exec time [PR113175]

The adjustment to max_size_type.cc in r14-205-g83470a5cd4c3d2
inadvertently increased the execution time of this test by over 5x due
to making the two main loops actually run in the signed_p case instead
of being dead code.

To compensate, this patch cuts the relevant loops' range [-1000,1000] by
10x as proposed in the PR. This shouldn't significantly weaken the test
since the same important edge cases are still checked in the smaller range
and/or elsewhere. On my machine this reduces the test's execution time by
roughly 10x (and 1.6x relative to before r14-205).

PR testsuite/113175

libstdc++-v3/ChangeLog:

* testsuite/std/ranges/iota/max_size_type.cc (test02): Reduce
'limit' to 100 from 1000 and adjust 'log2_limit' accordingly.
(test03): Likewise.

(cherry picked from commit a138b99646a5551c53b860648521adb5bfe8c2fa)

Daily bump.

c++: constraint rewriting during ttp coercion [PR111485]

In order to compare the constraints of a ttp with that of its argument,
we rewrite the ttp's constraints in terms of the argument template's
template parameters. The substitution to achieve this currently uses a
single level of template arguments, but that never does the right thing
because a ttp's template parameters always have level >= 2. This patch
fixes this by including the outer template arguments in the substitution,
which ought to match the depth of the ttp.

The second testcase demonstrates it's better to substitute the concrete
outer template arguments instead of generic ones since a ttp's constraints
could depend on outer parameters.

PR c++/111485

gcc/cp/ChangeLog:

* pt.c (is_compatible_template_arg): New parameter 'args'.
Add the outer template arguments 'args' to 'new_args'.
(convert_template_argument): Pass 'args' to
is_compatible_template_arg.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-ttp5.C: New test.
* g++.dg/cpp2a/concepts-ttp6.C: New test.

(cherry picked from commit 6f902a42b0afe3f3145bcb864695fc290b5acc3e)

Daily bump.

c++: -Wdeprecated-copy and using operator= [PR92145]

For the purpose of [depr.impldec] "if the class has a user-declared copy
assignment operator", an operator= brought in from a base class with 'using'
may be a copy-assignment operator, but it isn't a copy-assignment operator
for the derived class.

gcc/cp/ChangeLog:

PR c++/92145
* class.c (classtype_has_depr_implicit_copy): Check DECL_CONTEXT
of operator=.

gcc/testsuite/ChangeLog:

PR c++/92145
* g++.dg/cpp0x/depr-copy3.C: New test.

(cherry picked from commit 37846c42f1f5ac4d9ba190d49c4373673c89c8b5)

c++: NRV and goto [PR92407]

Here our named return value optimization was breaking the required
destructor when the goto takes 'a' out of scope. A simple fix for the
release branches is to disable the optimization in the presence of backward
goto.

We could do better by disabling the optimization only if there is a backward
goto across the variable declaration, but we don't track that, and in GCC 14
we instead make the goto work with NRV.

PR c++/92407

gcc/cp/ChangeLog:

* cp-tree.h (struct language_function): Add backward_goto.
* decl.c (check_goto): Set it.
* typeck.c (check_return_expr): Prevent NRV if set.

gcc/testsuite/ChangeLog:

* g++.dg/opt/nrv22.C: New test.

(cherry picked from commit a645347c19b07cc7abd7bf276c6769fc41afc932)

c++: value dependence of by-ref lambda capture [PR108975]

We are still ICEing on the generic lambda version of the testcase from
this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture
of the constant local variable 'dim' being considered value-dependent
when regenerating the lambda (at which point processing_template_decl is
set since the lambda is generic), which prevents us from constant folding
its uses.  Later during prune_lambda_captures we end up not thoroughly
walking the body of the lambda and overlook the (non-folded) uses of
'dim' within the array bound and using-decls.

We could fix this by making prune_lambda_captures walk the body of the
lambda more thoroughly so that it finds these uses of 'dim', but ideally
we should be able to constant fold all uses of 'dim' ahead of time and
prune the implicit capture after all.

To that end this patch makes value_dependent_expression_p return false
for such by-ref captures of constant local variables, allowing their
uses to get constant folded ahead of time.  It seems we just need to
disable the predicate's conservative early exit for reference variables
(added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P.  This
effectively makes us treat by-value and by-ref captures more consistently
when it comes to value dependence.

PR c++/108975

gcc/cp/ChangeLog:

* pt.c (value_dependent_expression_p) <case VAR_DECL>:
Suppress conservative early exit for reference variables
when DECL_HAS_VALUE_EXPR_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const11a.C: New test.

(cherry picked from commit 3d674e29d7f89bf93fcfcc963ff0248c6347586d)

Daily bump.

i386: Fix mmx.md signbit expanders [PR112816]

Apparently when looking for "signbit<mode>2" vector expanders, I've only
looked at sse.md and forgot mmx.md, which has another one and the
following patch still ICEd.

2023-12-19 Jakub Jelinek <jakub@redhat.com>

PR target/112816
* config/i386/mmx.md (signbitv2sf2): Force operands[1] into a REG.

* gcc.target/i386/sse2-pr112816-2.c: New test.

(cherry picked from commit 80e1375ed7a7a05a5a60a57e72c5ad5eba005798)

Daily bump.

c++: Unshare folded SAVE_EXPR arguments during cp_fold [PR112727]

The following testcase is miscompiled because two ubsan instrumentations
run into each other.
The first one is the shift instrumentation.  Before the C++ FE calls
it, it wraps the 2 shift arguments with cp_save_expr, so that side-effects
in them aren't evaluated multiple times.  And, ubsan_instrument_shift
itself uses unshare_expr on any uses of the operands to make sure further
modifications in them don't affect other copies of them (the only not
unshared ones are the one the caller then uses for the actual operation
after the instrumentation, which means there is no tree sharing).

Now, if there are side-effects in the first operand like say function
call, cp_save_expr wraps it into a SAVE_EXPR, and ubsan_instrument_shift
in this mode emits something like
if (..., SAVE_EXPR <foo ()>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <foo ()>, ...);
and caller adds
SAVE_EXPR <foo ()> << SAVE_EXPR <op1>
after it in a COMPOUND_EXPR.  So far so good.

If there are no side-effects and cp_save_expr doesn't create SAVE_EXPR,
everything is ok as well because of the unshare_expr.
We have
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., ptr->something[i], ...);
and
ptr->something[i] << SAVE_EXPR <op1>
where ptr->something[i] is unshared.

In the testcase below, the !x->s[j] ? 1 : 0 expression is wrapped initially
into a SAVE_EXPR though, and unshare_expr doesn't unshare SAVE_EXPRs nor
anything used in them for obvious reasons, so we end up with:
if (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0>, ...);
and
SAVE_EXPR <!(bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 1 : 0> << SAVE_EXPR <op1>
So far good as well.  But later during cp_fold of the SAVE_EXPR we find
out that VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1 is actually
invariant (has TREE_READONLY set) and so cp_fold simplifies the above to
if (..., SAVE_EXPR <op1> > const)
__ubsan_handle_shift_out_of_bounds (..., (bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1, ...);
and
((bool) VIEW_CONVERT_EXPR<const struct S *>(x)->s[j] ? 0 : 1) << SAVE_EXPR <op1>
with the s[j] ARRAY_REFs and other expressions shared in between the two
uses (and obviously the expression optimized away from the COMPOUND_EXPR in
the if condition.

Then comes another ubsan instrumentation at genericization time,
this time to instrument the ARRAY_REFs with strict bounds checking,
and replaces the s[j] in there with s[.UBSAN_BOUNDS (0B, SAVE_EXPR<j>, 8), SAVE_EXPR<j>]
As the trees are shared, it does that just once though.
And as the if body is gimplified first, the SAVE_EXPR<j> is evaluated inside
of the if body and when it is used again after the if, it uses a potentially
uninitialized value of j.1 (always uninitialized if the shift count isn't
out of bounds).

The following patch fixes that by unshare_expr unsharing the folded argument
of a SAVE_EXPR if we've folded the SAVE_EXPR into an invariant and it is
used more than once.

2023-12-08  Jakub Jelinek  <jakub@redhat.com>

PR sanitizer/112727
* cp-gimplify.c (cp_fold): If SAVE_EXPR has been previously
folded, unshare_expr what is returned.

* c-c++-common/ubsan/pr112727.c: New test.

(cherry picked from commit 6ddaf06e375e1c15dcda338697ab6ea457e6f497)

fold-const: Fix up multiple_of_p [PR112733]

We ICE on the following testcase when wi::multiple_of_p is called on
widest_int 1 and -128 with UNSIGNED.  I still need to work on the
actual wide-int.cc issue, the latest patch attached to the PR regressed
bitint-{38,39}.c, so will need to debug that, but there is a clear bug
on the fold-const.cc side as well - widest_int is a signed representation
by definition, using UNSIGNED with it certainly doesn't match what was
intended, because -128 as the second operand effectively means unsigned
131072 bit 0xfffff............ffff80 integer, not the signed char -128
that appeared in the source.

In the INTEGER_CST case a few lines above this we already use
    case INTEGER_CST:
      if (TREE_CODE (bottom) != INTEGER_CST || integer_zerop (bottom))
        return false;
      return wi::multiple_of_p (wi::to_widest (top), wi::to_widest (bottom),
                                SIGNED);
so I think using SIGNED with widest_int is best there (compared to the
other choices in the PR).

2023-11-29  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112733
* fold-const.c (multiple_of_p): Pass SIGNED rather than
UNSIGNED for wi::multiple_of_p on widest_int arguments.

* gcc.dg/pr112733.c: New test.

(cherry picked from commit 5c95bf945c632925efba86dd5dceccdb9da8884c)

i386: Fix -fcf-protection -Os ICE due to movabsq peephole2 [PR112845]

The following testcase ICEs in the movabsq $(i32 << shift), r64 peephole2
I've added a while back to use smaller code than movabsq if possible.
If i32 is 0xfa1e0ff3 and shift is not divisible by 8, then it creates
an invalid insn (as 0xfa1e0ff3 CONST_INT is not allowed as
x86_64_immediate_operand nor x86_64_zext_immediate_operand), the peephole2
even triggers on it again and again (this time with shift 0) until it gives
up.

The following patch fixes that. As ix86_endbr_immediate_operand needs a
CONST_INT and it is hopefully rare, I chose to use FAIL rather than handling
it in the condition (where I'd probably need to call ctz_hwi again etc.).

2023-12-05 Jakub Jelinek <jakub@redhat.com>

PR target/112845
* config/i386/i386.md (movabsq $(i32 << shift), r64 peephole2): FAIL
if the new immediate is ix86_endbr_immediate_operand.

(cherry picked from commit e0786ca9a18c50ad08c40936b228e325193664b8)

i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]

The following testcase ICEs with RTL checking, because it sets if
XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
is actually an UNSPEC, so any time we see any other insn with PARALLEL
and a SET in it which is not an UNSPEC we ICE during RTL checking or
access there some other union member as if it was an rt_int.
The rest is just small cleanup.

2023-12-04 Jakub Jelinek <jakub@redhat.com>

PR target/112837
* config/i386/i386.c (ix86_elim_entry_set_got): Before checking
for UNSPEC_SET_GOT check that SET_SRC is UNSPEC. Use SET_SRC and
SET_DEST macros instead of XEXP, rename vec variable to set.

* gcc.dg/pr112837.c: New test.

(cherry picked from commit 4586d7d0a92e9b60d0c01043e0ae262b1e06f337)

i386: Fix up signbit<mode>2 expander [PR112816]

The following testcase ICEs, because the signbit<mode>2 expander uses an
explicit SUBREG in the pattern around match_operand with register_operand
predicate. If we are unlucky enough that expansion tries to expand it
with some SUBREG as operands[1], we have two nested SUBREGs in the IL,
which is not valid and causes ICE later.

2023-12-04 Jakub Jelinek <jakub@redhat.com>

PR target/112816
* config/i386/sse.md (signbit<mode>2): Force operands[1] into a REG.

* gcc.target/i386/sse2-pr112816.c: New test.

(cherry picked from commit 994d6dc64435d6b7c50accca9941ee7decd92a22)

c++: #pragma GCC unroll C++ fixes [PR112795]

foo in the unroll-5.C testcase ICEs because cp_parser_pragma_unroll
during parsing calls maybe_constant_value unconditionally, which is
fine if !processing_template_decl, but can ICE otherwise.

While just calling fold_non_dependent_expr there instead could be enough
to fix the ICE (and I guess the right thing to do for backports if any),
I don't see a reason why we couldn't handle a dependent #pragma GCC unroll
argument as well, the unrolling isn't done in the FE and all the middle-end
cares about is that ANNOTATE_EXPR has a 1..65534 last operand when it is
annot_expr_unroll_kind.

So, the following patch changes all the unsigned short unroll arguments
to tree unroll (and thus avoids the tree -> unsigned short -> tree
conversions), does the type and value checking during parsing only if
the argument isn't dependent and repeats it during instantiation.

2023-12-04 Jakub Jelinek <jakub@redhat.com>

PR c++/112795
gcc/cp/
* parser.c (cp_parser_pragma_unroll): Use fold_non_dependent_expr
instead of maybe_constant_value.
gcc/testsuite/
* g++.dg/ext/unroll-5.C: New test.

(cherry picked from commit b6c78feea08c36e5754818c6a3d7536b3f8913dc)

i386: Fix up *jcc_bt*_mask{,_1} [PR111408]

The following testcase is miscompiled in GCC 14 because the
*jcc_bt<mode>_mask and *jcc_bt<SWI48:mode>_mask_1 patterns have just
one argument in (match_operator 0 "bt_comparison_operator" [...])
but as bt_comparison_operator is eq,ne, we need two.
The md readers don't warn about it, after all, some checks can
be done in the predicate rather than specified explicitly, and the
behavior is that anything is accepted as the second argument.

I went through all other i386.md match_operator uses and all others
looked right (extract_operator using 3 operands, all others 2).

I think we'll want to fix this at different spots in older releases
because I think the bug was introduced already in 2008, though most
likely just latent.

2023-11-25 Jakub Jelinek <jakub@redhat.com>

PR target/111408
* config/i386/i386.md (*jcc_bt<mode>_mask): Add (const_int 0) as
expected second operand of bt_comparison_operator.

* gcc.c-torture/execute/pr111408.c: New test.

(cherry picked from commit 9866c98e1015d98b8fc346d7cf73a0070cce5f69)

libstdc++: Fix tr1/8_c_compatibility/cstdio/functions.cc regression with recent glibc

The following testcase started FAILing recently after the
https://sourceware.org/git/?p=glibc.git;a=commit;h=64b1a44183a3094672ed304532bedb9acc707554
glibc change which marked vfscanf with nonnull (1) attribute.
While vfwscanf hasn't been marked similarly (strangely), the patch changes
that too. By using va_arg one hides the value of it from the compiler
(volatile keyword would do too, or making the FILE* stream a function
argument, but then it might need to be guarded by #if or something).

2023-10-13 Jakub Jelinek <jakub@redhat.com>

* testsuite/tr1/8_c_compatibility/cstdio/functions.cc (test01):
Initialize stream to va_arg(ap, FILE*) rather than 0.
* testsuite/tr1/8_c_compatibility/cwchar/functions.cc (test01):
Likewise.

(cherry picked from commit badb798f5e96a995bb9fa8c4ea48071aa4f2b4b3)

wide-int: Fix up wi::divmod_internal [PR110731]

As the following testcase shows, wi::divmod_internal doesn't handle
correctly signed division with precision > 64 when the dividend (and likely
divisor as well) is the type's minimum and the precision isn't divisible
by 64.

A few lines above what the patch hunk changes is:
  /* Make the divisor and dividend positive and remember what we
     did.  */
  if (sgn == SIGNED)
    {
      if (wi::neg_p (dividend))
        {
          neg_dividend = -dividend;
          dividend = neg_dividend;
          dividend_neg = true;
        }
      if (wi::neg_p (divisor))
        {
          neg_divisor = -divisor;
          divisor = neg_divisor;
          divisor_neg = true;
        }
    }
i.e. we negate negative dividend or divisor and remember those.
But, after we do that, when unpacking those values into b_dividend and
b_divisor we need to always treat the wide_ints as UNSIGNED,
because divmod_internal_2 performs an unsigned division only.
Now, if precision <= 64, we don't reach here at all, earlier code
handles it.  If dividend or divisor aren't the most negative values,
the negation clears their most significant bit, so it doesn't really
matter if we unpack SIGNED or UNSIGNED.  And if precision is multiple
of HOST_BITS_PER_WIDE_INT, there is no difference in behavior, while
-0x80000000000000000000000000000000 negates to
-0x80000000000000000000000000000000 the unpacking of it as SIGNED
or UNSIGNED works the same.
In the testcase, we have signed precision 119 and the dividend is
val = { 0, 0xffc0000000000000 }, len = 2, precision = 119
both before and after negation.
Divisor is
val = { 2 }, len = 1, precision = 119
But we really want to divide 0x400000000000000000000000000000 by 2
unsigned and then negate at the end.
If it is unsigned precision 119 division
0x400000000000000000000000000000 by 2
dividend is
val = { 0, 0xffc0000000000000 }, len = 2, precision = 119
but as we unpack it UNSIGNED, it is unpacked into
0, 0, 0, 0x00400000

The following patch fixes it by always using UNSIGNED unpacking
because we've already negated negative values at that point if
sgn == SIGNED and so most negative constants should be treated as
positive.

2023-07-19  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/110731
* wide-int.cc (wi::divmod_internal): Always unpack dividend and
divisor as UNSIGNED regardless of sgn.

* gcc.dg/pr110731.c: New test.

(cherry picked from commit ece799607c841676f4e00c2fea98bbec6976da3f)

Daily bump.

tree-optimization/111917 - bougs IL after guard hoisting

The unswitching code to hoist guards inserts conditions in wrong
places. The following fixes this, simplifying code.

PR tree-optimization/111917
* tree-ssa-loop-unswitch.c (hoist_guard): Always insert
new conditional after last stmt.

* gcc.dg/torture/pr111917.c: New testcase.

(cherry picked from commit d96bd4aade170fcd86f5f09b68b770dde798e631)

tree-optimization/111614 - missing convert in undistribute_bitref_for_vector

The following adjusts a flawed guard for converting the first vector
of the sum we create in undistribute_bitref_for_vector.

PR tree-optimization/111614
* tree-ssa-reassoc.c (undistribute_bitref_for_vector): Properly
convert the first vector when required.

* gcc.dg/torture/pr111614.c: New testcase.

(cherry picked from commit 88d79b9b03eccf39921d13c2cbd1acc50aeda126)

tree-optimization/111764 - wrong reduction vectorization

The following removes a misguided attempt to allow x + x in a reduction
path, also allowing x * x which isn't valid. x + x actually never
arrives this way but instead is canonicalized to 2 * x. This makes
reduction path handling consistent with how we handle the single-stmt
reduction case.

PR tree-optimization/111764
* tree-vect-loop.c (check_reduction_path): Remove the attempt
to allow x + x via special-casing of assigns.

* gcc.dg/vect/pr111764.c: New testcase.

(cherry picked from commit 05f98310b54da95e468d799f4a910174320cccbb)

middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile

The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.

The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles. The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.

PR middle-end/111818
* tree-ssa.c (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.

* gcc.dg/torture/pr111818.c: New testcase.

(cherry picked from commit ce55521bcd149fdc431f1d78e706b66d470210ae)

tree-optimization/110298 - CFG cleanup and stale nb_iterations

When unrolling we eventually kill nb_iterations info since it may
refer to removed SSA names. But we do this only after cleaning
up the CFG which in turn can end up accessing it. Fixed by
swapping the two.

PR tree-optimization/110298
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely):
Clear number of iterations info before cleaning up the CFG.

* gcc.dg/torture/pr110298.c: New testcase.

(cherry picked from commit 916add3bf6e46467e4391e358b11ecfbc4daa275)

debug/110295 - mixed up early/late debug for member DIEs

When we process a scope typedef during early debug creation and
we have already created a DIE for the type when the decl is
TYPE_DECL_IS_STUB and this DIE is still in limbo we end up
just re-parenting that type DIE instead of properly creating
a DIE for the decl, eventually picking up the now completed
type and creating DIEs for the members. Instead this is currently
defered to the second time we come here, when we annotate the
DIEs with locations late where now the type DIE is no longer
in limbo and we fall through doing the job for the decl.

The following makes sure we perform the necessary early tasks
for this by continuing with the decl DIE creation after setting
a parent for the limbo type DIE.

PR debug/110295
* dwarf2out.c (process_scope_var): Continue processing
the decl after setting a parent in case the existing DIE
was in limbo.

* g++.dg/debug/pr110295.C: New testcase.

(cherry picked from commit 963f87f8a65ec82f503ac4334a3da83b0a8a43b2)

middle-end/110182 - TYPE_PRECISION on VECTOR_TYPE causes wrong-code

When folding two conversions in a row we use TYPE_PRECISION but
that's invalid for VECTOR_TYPE. The following fixes this by
using element_precision instead.

* match.pd (two conversions in a row): Use element_precision
to DTRT for VECTOR_TYPE.

(cherry picked from commit 3e12669a0eb968cfcbe9242b382fd8020935edf8)

Daily bump.

Don't assume it's AVX_U128_CLEAN after call_insn whose abi.mode_clobber(V4DImode) deosn't contains all SSE_REGS.

If the function desn't clobber any sse registers or only clobber
128-bit part, then vzeroupper isn't issued before the function exit.
the status not CLEAN but ANY after the function.

Also for sibling_call, it's safe to issue an vzeroupper. Also there
could be missing vzeroupper since there's no mode_exit for
sibling_call_p.

gcc/ChangeLog:

PR target/112891
* config/i386/i386.c (ix86_avx_u128_mode_after): Return
AVX_U128_ANY if callee_abi doesn't clobber all_sse_regs to
align with ix86_avx_u128_mode_needed.
(ix86_avx_u128_mode_needed): Return AVX_U128_ClEAN for
sibling_call.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112891.c: New test.
* gcc.target/i386/pr112891-2.c: New test.

(cherry picked from commit fc189a08f5b7ad5889bd4c6b320c1dd99dd5d642)

Daily bump.