git.ipfire.org Git - thirdparty/gcc.git/log

borrowck: Avoid overloading issues on 32bit architectures

On architectures where `size_t` is `unsigned int`, such as 32bit x86,
we encounter an issue with `PlaceId` and `FreeRegion` being aliases to
the same types. This poses an issue for overloading functions for these
two types, such as `push_subset` in that case. This commit renames one
of these `push_subset` functions to avoid the issue, but this should be
fixed with a newtype pattern for these two types.

gcc/rust/ChangeLog:

* checks/errors/borrowck/rust-bir-fact-collector.h (points): Rename
`push_subset(PlaceId, PlaceId)` to `push_subset_place(PlaceId, PlaceId)`

ifcvt: Handle multiple rewired regs and refactor noce_convert_multiple_sets

The existing implementation of need_cmov_or_rewire and
noce_convert_multiple_sets_1 assumes that sets are either REG or SUBREG.
This commit enchances them so they can handle/rewire arbitrary set statements.

To do that a new helper struct noce_multiple_sets_info is introduced which is
used by noce_convert_multiple_sets and its helper functions. This results in
cleaner function signatures, improved efficientcy (a number of vecs and hash
set/map are replaced with a single vec of struct) and simplicity.

gcc/ChangeLog:

* ifcvt.cc (need_cmov_or_rewire): Renamed init_noce_multiple_sets_info.
(init_noce_multiple_sets_info): Initialize noce_multiple_sets_info.
(noce_convert_multiple_sets_1): Use noce_multiple_sets_info and handle
rewiring of multiple registers.
(noce_convert_multiple_sets): Updated to use noce_multiple_sets_info.
* ifcvt.h (struct noce_multiple_sets_info): Introduce new struct
noce_multiple_sets_info to store info for noce_convert_multiple_sets.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: New test.

ifcvt: Allow more operations in multiple set if conversion

Currently the operations allowed for if conversion of a basic block
with multiple sets are few, namely REG, SUBREG and CONST_INT (as
controlled by bb_ok_for_noce_convert_multiple_sets).

This commit allows more operations (arithmetic, compare, etc) to
participate in if conversion. The target's profitability hook and
ifcvt's costing is expected to reject sequences that are unprofitable.

This is especially useful for targets which provide a rich selection
of conditional instructions (like aarch64 which has cinc, csneg,
csinv, ccmp, ...) which are currently not used in basic blocks with
more than a single set.

For targets that have a rich selection of conditional instructions,
like aarch64, we have seen an ~5x increase of profitable if
conversions for multiple set blocks in SPEC CPU 2017 benchmarks.

gcc/ChangeLog:

* ifcvt.cc (try_emit_cmove_seq): Modify comments.
(noce_convert_multiple_sets_1): Modify comments.
(bb_ok_for_noce_convert_multiple_sets): Allow more operations.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.

ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

This is an extension of what was done in PR106590.

Currently if a sequence generated in noce_convert_multiple_sets clobbers the
condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
(sequences that emit the comparison itself). Since this applies only from the
next iteration it assumes that the sequences generated (in particular seq2)
doesn't clobber the condition rtx itself before using it in the if_then_else,
which is only true in specific cases (currently only register/subregister moves
are allowed).

This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
the current iteration. It also checks whether the resulting sequence clobbers
the condition attached to the jump. This makes it possible to include arithmetic
operations in noce_convert_multiple_sets.

It also makes the code that checks whether the condition is used outside of the
if_then_else emitted more robust.

gcc/ChangeLog:

* ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
(noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
Punt if seq clobbers cond. Refactor the code that sets read_comparison.

AVR: target/85624 - Fix non-matching alignment in clrmem* insns.

The clrmem* patterns don't use the provided alignment information,
hence the setmemhi expander can just pass down 0 as alignment to
the clrmem* insns.

PR target/85624
gcc/
* config/avr/avr.md (setmemhi): Set alignment to 0.

gcc/testsuite/
* gcc.target/avr/torture/pr85624.c: New test.

16-bit testsuite fixes - excessive code size

gcc/testsuite/
* gcc.c-torture/execute/20021120-1.c: Skip if not size20plus or -Os.
* gcc.dg/fixed-point/convert-float-4.c: Require size20plus.
* gcc.dg/torture/pr112282.c: Skip if -O0 unless size20plus.
* g++.dg/lookup/pr21802.C: Require size20plus.

This fixes problems with tests that exceed a data type or the maximum stack frame size on 16 bit targets.

Note: GCC has a limitation that a stack frame cannot exceed half the address space.

For two tests the decision to modify or skip them seems not so clear-cut;
I choose to modify gcc.dg/pr47893.c to use types that fit the numbers, as
that seemed to have little impact on the test, and skip gcc.dg/pr115646.c
for 16 bit, as layout of structs with bitfields members can have quite
subtle rules.

gcc/testsuite/
* gcc.dg/pr107523.c: Make sure variables can fit numbers.
* gcc.dg/pr47893.c: Add dg-require-effective-target size20plus clause.
* c-c++-common/torture/builtin-clear-padding-2.c:
dg-require-effective-target size20plus.
* gcc.dg/pr115646.c: dg-require-effective-target int32plus.
* c-c++-common/analyzer/coreutils-sum-pr108666.c:
For c++, expect a warning about exceeding maximum object size
if not size20plus.
* gcc.dg/torture/inline-mem-cpy-1.c:
Like the included file, dg-require-effective-target ptr32plus.
* gcc.dg/torture/inline-mem-cmp-1.c: Likewise.

Avoid cfg corruption when using sjlj exceptions where loops are present in the assign_params emitted code.

2024-08-06 Joern Rennecke <joern.rennecke@riscy-ip.com>

gcc/
* except.cc (sjlj_emit_function_enter):
Set fn_begin_outside_block again if encountering a jump instruction.

Use splay-tree-utils.h in tree-ssa-sccvn [PR30920]

This patch is an attempt to gauge opinion on one way of fixing PR30920.

The PR points out that the libiberty splay tree implementation does
not implement the algorithm described by Sleator and Tarjan and has
unclear complexity bounds.  (It's also somewhat dangerous in that
splay_tree_min and splay_tree_max walk the tree without splaying,
meaning that they are fully linear in the worst case, rather than
amortised logarithmic.)  These properties have been carried over
to typed-splay-tree.h.

We could fix those problems directly in the existing implementations,
and probably should for libiberty.  But when I added rtl-ssa, I also
added a third(!) splay tree implementation: splay-tree-utils.h.
In response to Jeff's understandable unease about having three
implementations, I was supposed to go back during the next stage 1
and reduce it to no more than two.  I never did that. :-(

splay-tree-utils.h is so called because rtl-ssa uses splay trees
in structures that are relatively small and very size-sensitive.
I therefore wanted to be able to embed the splay tree links directly
in the structures, rather than pay the penalty of using separate
nodes with one-way or two-way links between them.  There were also
operations for which it was convenient to treat the splay tree root
as an explicitly managed cursor, rather than treating the tree as
a pure ADT.  The interface is therefore a bit more low-level than
for the other implementations.

I wondered whether the same trade-offs might apply to users of
the libiberty splay trees.  The first one I looked at in detail
was SCC value numbering, which seemed like it would benefit from
using splay-tree-utils.h directly.

The patch does that.  It also adds a couple of new helper routines
to splay-tree-utils.h.

I don't expect this approach to be the right one for every use
of splay trees.  E.g. splay tree used for omp gimplification would
certainly need separate nodes.

gcc/
PR other/30920
* splay-tree-utils.h (rooted_splay_tree::insert_relative)
(rooted_splay_tree::lookup_le): New functions.
(rooted_splay_tree::remove_root_and_splay_next): Likewise.
* splay-tree-utils.tcc (rooted_splay_tree::insert_relative): New
function, extracted from...
(rooted_splay_tree::insert): ...here.
(rooted_splay_tree::lookup_le): New function.
(rooted_splay_tree::remove_root_and_splay_next): Likewise.
* tree-ssa-sccvn.cc (pd_range::m_children): New member variable.
(vn_walk_cb_data::vn_walk_cb_data): Initialize first_range.
(vn_walk_cb_data::known_ranges): Use a default_splay_tree.
(vn_walk_cb_data::~vn_walk_cb_data): Remove freeing of known_ranges.
(pd_range_compare, pd_range_alloc, pd_range_dealloc): Delete.
(vn_walk_cb_data::push_partial_def): Rewrite splay tree operations
to use splay-tree-utils.h.
* rtl-ssa/accesses.cc (function_info::add_use): Use insert_relative.

aarch64: Emit ADD X, Y, Y instead of SHL X, Y, #1 for Advanced SIMD

On many cores, including Neoverse V2 the throughput of vector ADD
instructions is higher than vector shifts like SHL.  We can lean on that
to emit code like:
  add     v0.4s, v0.4s, v0.4s
instead of:
  shl     v0.4s, v0.4s, 1

LLVM already does this trick.
In RTL the code gets canonincalised from (plus x x) to (ashift x 1) so I
opted to instead do this at the final assembly printing stage, similar
to how we emit CMLT instead of SSHR elsewhere in the backend.

I'd like to also do this for SVE shifts, but those will have to be
separate patches.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_simd_imm_shl<mode><vczle><vczbe>): Rewrite to new
syntax.  Add =w,w,vs1 alternative.
* config/aarch64/constraints.md (vs1): New constraint.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd_shl_add.c: New test.

Fortran: Fix coarray in associate not linking [PR85510]

PR fortran/85510

gcc/fortran/ChangeLog:

* resolve.cc (resolve_variable): Mark the variable as host
associated only, when it is not in an associate block.
* trans-decl.cc (generate_coarray_init): Remove incorrect unused
flag on parameter.

gcc/testsuite/ChangeLog:

* gfortran.dg/coarray/pr85510.f90: New test.

Initial support for AVX10.2

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Handle
avx10.2.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_2_256_SET): New.
(OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_256_UNSET):
Add OPTION_MASK_ISA2_AVX10_2_256_UNSET.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET):
Add OPTION_MASK_ISA2_AVX10_2_512_UNSET.
(OPTION_MASK_ISA2_AVX10_2_256_UNSET): New.
(OPTION_MASK_ISA2_AVX10_2_512_UNSET): Ditto.
(ix86_handle_option): Handle avx10.2-256 and avx10.2-512.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX10_2_256 and FEATURE_AVX10_2_512.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY for
avx10.2-256 and avx10.2-512.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__AVX10_2_256__ and __AVX10_2_512__.
* config/i386/i386-isa.def (AVX10_2): Add DEF_PTA(AVX10_2_256)
and DEF_PTA(AVX10_2_512).
* config/i386/i386-options.cc (isa2_opts): Add -mavx10.2-256 and
-mavx10.2-512.
(ix86_valid_target_attribute_inner_p): Handle avx10.2-256 and
avx10.2-512.
* config/i386/i386.opt: Add option -mavx10.2, -mavx10.2-256 and
-mavx10.2-512.
* config/i386/i386.opt.urls: Regenerated.
* doc/extend.texi: Document avx10.2, avx10.2-256 and avx10.2-512.
* doc/invoke.texi: Document -mavx10.2, -mavx10.2-256 and
-mavx10.2-512.
* doc/sourcebuild.texi: Document target avx10.2, avx10.2-256,
avx10.2-512.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.

PR target/116275: Handle STV of *extenddi2_doubleword_highpart on i386.

This patch resolves PR target/116275, a recent ICE-on-valid regression on
-m32 caused by my recent change to enable STV of DImode arithmeric right
shift on non-AVX512VL targets.  The oversight is that the i386 backend
contains an *extenddi2_doubleword_highpart instruction (whose pattern
is an arithmetic right shift of a left shift) that optimizes the case where
sign-extension need only update the highpart word of a DImode value when
generating 32-bit code (!TARGET_64BIT).  STV accepts this pattern as a
candidate, as there are patterns to handle this form of extension on SSE
using AVX512VL instructions (and previously ASHIFTRT was only allowed on
AVX512VL).  Now that ASHIFTRT is a candidate on non-AVX512vL targets, we
either need to check that the first operand is a register, or as done
below provide the define_insn_and_split that provides a non-AVX512VL
implementation of *extendv2di_highpart_stv.

The new testcase only ICEed with -m32, so this test could be limited to
target ia32, but there's no harm also running this test on -m64 to
provide a little extra test coverage.

2024-08-12  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR target/116275
* config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): New
define_insn_and_split to handle the STV conversion of the DImode
pattern *extendsi2_doubleword_highpart.

gcc/testsuite/ChangeLog
PR target/116275
* g++.target/i386/pr116275.C: New test case.

LoongArch: Provide ashr lshr and ashl RTL pattern for vectors.

We support vashr vlshr and vashl. However, in r15-1638 support optimize
x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31.
To support this optimization, vector ashr lshr and ashl need to be implemented.

gcc/ChangeLog:

* config/loongarch/loongarch.md (insn): Added rotatert rotr pairs.
* config/loongarch/simd.md (rotr<mode>3): Remove to ...
(<optab><mode>3): This.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/vect-ashr-lshr.C: New test.

LoongArch: Drop vcond{,u} expanders.

Optabs vcond{,u} will be removed for GCC 15. Since regtest shows no
fallout, dropping the expanders, now.

gcc/ChangeLog:

PR target/114189
* config/loongarch/lasx.md (vcondu<LASX:mode><ILASX:mode>): Delete.
(vcond<LASX:mode><LASX_2:mode>): Likewise.
* config/loongarch/lsx.md (vcondu<LSX:mode><ILSX:mode>): Likewise.
(vcond<LSX:mode><LSX_2:mode>): Likewise.

LoongArch: Use iorn and andn standard pattern names.

R15-1890 introduced new optabs iorc and andc, and its corresponding
internal functions BIT_{ANDC,IORC}, and if targets defines such optabs
for vector modes. And in r15-2258 the iorc and andc were renamed to
iorn and andn.
So we changed the andn and iorn implementation templates to the standard
template names.

gcc/ChangeLog:

* config/loongarch/lasx.md (xvandn<mode>3): Rename to ...
(andn<mode>3): This.
(xvorn<mode>3): Rename to ...
(iorn<mode>3): This.
* config/loongarch/loongarch-builtins.cc
(CODE_FOR_lsx_vandn_v): Defined as the modified name.
(CODE_FOR_lsx_vorn_v): Likewise.
(CODE_FOR_lasx_xvandn_v): Likewise.
(CODE_FOR_lasx_xvorn_v): Likewise.
(loongarch_expand_builtin_insn): When the builtin function to be
called is __builtin_lasx_xvandn or __builtin_lsx_vandn, swap the
two operands.
* config/loongarch/loongarch.md (<optab>n<mode>): Rename to ...
(<optab>n<mode>3): This.
* config/loongarch/lsx.md (vandn<mode>3): Rename to ...
(andn<mode>3): This.
(vorn<mode>3): Rename to ...
(iorn<mode>3): This.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/lasx-andn-iorn.c: New test.
* gcc.target/loongarch/lsx-andn-iorn.c: New test.

PR modula2/116181 fix ODR warnings for C/m2 interface library modules

This patch fixes many ODR warnings which appear when compiling the
interface files found in gcc/m2/*-ch/ and gcc/m2/{pge,mc}-boot
directories.

gcc/m2/ChangeLog:

PR modula2/116181
* gm2-compiler/ppg.mod (FindStr): Initialize j.
* gm2-libs-ch/UnixArgs.cc (_M2_UnixArgs_ctor): Replace
M2RTS_RegisterModule with M2RTS_RegisterModule_Cstr.
* gm2-libs-ch/dtoa.cc (_M2_dtoa_ctor): Ditto.
* gm2-libs-ch/ldtoa.cc (ldtoa_strtold): Cast parameter s
for strtod.
(_M2_ldtoa_ctor): Replace M2RTS_RegisterModule with
M2RTS_RegisterModule_Cstr.
* gm2-libs-ch/m2rts.h (M2RTS_RegisterModule_Cstr): New
define.
(M2RTS_RegisterModule): Remove const.
* mc-boot-ch/GSelective.c (Selective_FdIsSet): Return bool
rather than int.
* mc-boot-ch/Gldtoa.cc (ldtoa_strtold): Change const char to
void.
Cast s before passing as a parameter to strtod.
* mc-boot-ch/Glibc.c (tracedb_open): Replace const char with const
void.
(libc_perror): Replace char with const char.
(libc_printf): Replace char with void.
(libc_snprintf): Replace char with void.
Add const_cast for parameter to index.
Add reinterpret_cast for parameter to vsnprintf.
(libc_open): Replace first paramter type char with void.
Add vararg for the third parameter.
* mc-boot-ch/Gm2rtsdummy.cc (M2RTS_RequestDependant): Remove #if 0 code.
(m2pim_M2RTS_RegisterModule): Change const char parameters to void
(M2RTS_RegisterModule): Ditto.
(_M2_M2RTS_init): Remove #if 0 code.
(M2RTS_ConstructModules): Ditto.
(M2RTS_Terminate): Ditto.
(M2RTS_DeconstructModules): Ditto.
(M2RTS_Halt): Ditto.
* mc-boot-ch/Gtermios.cc (SetFlag): Return bool.
* mc-boot-ch/m2rts.h (M2RTS_RegisterModule_Cstr): New define.
(M2RTS_RegisterModule): Change const char parameters to void.
* mc-boot/Gdecl.cc: Regenerate.
* mc/decl.mod (getNextConstExp): Reimplement.
* pge-boot/GDynamicStrings.cc: Regenerate.
* pge-boot/GDynamicStrings.h: Ditto.
* pge-boot/GM2RTS.h (M2RTS_RegisterModule_Cstr): New function.
(M2RTS_RegisterModule): Reformat.
* pge-boot/GSymbolKey.cc: Regenerate.
* pge-boot/GSysExceptions.cc (_M2_SysExceptions_init): Add correct parameters.
(_M2_SysExceptions_fini): Ditto.
* pge-boot/GUnixArgs.cc (_M2_UnixArgs_ctor::_M2_UnixArgs_ctor):
Replace call to M2RTS_RegisterModule with M2RTS_RegisterModuleCstr.
* pge-boot/Gerrno.cc (_M2_errno_init): Add correct parameters.
(_M2_errno_fini): Ditto.
* pge-boot/Gldtoa.cc (ldtoa_strtold): Replace const char with
void.
Use reinterpret_cast when passing s to strtod.
Replace true with TRUE.
* pge-boot/Gldtoa.h (ldtoa_strtold): Tidy up.
* pge-boot/Glibc.cc (libc_read): Use size_t as the return type.
(libc_write): Ditto.
(libc_strlen): Ditto.
(libc_perror): Replace char with const char.
(libc_printf): Replace char to const char.
Cast parameter to index using const_cast.
(libc_snprintf): Replace char with void.
Cast parameter to index using const_cast.
(libc_malloc): Replace parameter type with size_t.
(libc_memcpy): Replace third parameter type with size_t.
(libc_open): Use varargs.
* pge-boot/Glibc.h (libc_perror): Add _string_high parameter.
* pge-boot/Gpge.cc: Regenerate.
* pge-boot/Gtermios.cc (SetFlag): Replace return type with bool.
(_M2_termios_init): Add correct parameters.
(_M2_termios_fini): Ditto.
* pge-boot/m2rts.h (M2RTS_RegisterModule_Cstr): New define.
(M2RTS_RegisterModule): Replace const char with void.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Daily bump.

Fortran: silence Wmaybe-uninitialized warnings for LTO build [PR116221]

PR fortran/116221

gcc/fortran/ChangeLog:

* intrinsic.cc (gfc_get_intrinsic_sub_symbol): Initialize variable.
* symbol.cc (gfc_get_ha_symbol): Likewise.

AVR: -mlra is not documeted in TEXI.

gcc/
* config/avr/avr.opt (mlra): Set Undocumented flag.

AVR: Add function avr.cc::ra_in_progress().

It returns lra_in_progress resp. reload_in_progress depending on avr_lra_p.
Currently, direct use of ra_in_progress() is only made with -mlog=.

gcc/
* config/avr/avr.cc (ra_in_progress): New static function.
(avr_legitimate_address_p, avr_addr_space_legitimate_address_p)
(extra_constraint_Q): Use it with -mlog=.

Daily bump.

i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]

After r14-811 "call *nop@GOTPCREL(%rip)" is only generated with
-mno-direct-extern-access even if --enable-default-pie. So the r13-1614
change to this file is not valid anymore.

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/i386/fentryname3.c (dg-final): Revert r13-1614
change.

i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]

For a --enable-default-pie build, using -fno-pic (for compiler) but
not -no-pie (for linker) triggers some linker warnings counted as
excess errors:

    /usr/bin/ld: /tmp/cc8MgxiR.o: warning: relocation in read-only
    section `.text.startup'
    /usr/bin/ld: warning: creating DT_TEXTREL in a PIE

gcc/testsuite/ChangeLog:

PR testsuite/70150
* gcc.target/i386/pr113689-1.c (dg-options): Add -no-pie.

Fix reference to the dom walker function in the documentation

It is using a class now with a different name.

gcc/ChangeLog:

* doc/cfg.texi: Fix references to dom_walker.

gm2: add missing debug output guard

The Close() procedure in MemStream is missing a guard to prevent it from
printing in non-debug mode.

gcc/gm2:
* gm2-libs-iso/MemStream.mod: Guard debug output.

Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>

testsuite: Fix up sse3-addsubps.c

The testcase uses sizeof (vals) / sizeof (vals) as the number of vals to
handle (though, handles 8 vals at a time). That is an obvious typo,
all similar testcases use sizeof (vals) / sizeof (vals[0]) properly.

2024-08-10 Jakub Jelinek <jakub@redhat.com>

* gcc.target/powerpc/sse3-addsubps.c (TEST): Divide by
sizeof (vals[0]) rather than sizeof (vals).

AVR: ad target/113934 - Add option -mlra to enable LRA.

PR target/113934
gcc/
* config/avr/avr.opt (-mlra): New target option.
* config/avr/avr.cc (avr_use_lra_p): New function.
(TARGET_LRA_P): Use it.
(avr_hard_regno_mode_ok) [lra]: Don't disallow 4-byte modes for X.

c++: inherited CTAD fixes [PR116276]

This implements the overlooked inherited vs non-inherited guide
tiebreaker from P2582R1. This requires tracking inherited-ness of a
guide, for which it seems natural to reuse the lang_decl_fn::context
field which for a constructor tracks its inherited-ness.

This patch also works around CLASSTYPE_CONSTRUCTORS not reliably
returning all inherited constructors (due to some using-decl handling
quirks in in push_class_level_binding) by iterating over TYPE_FIELDS
instead.

This patch also makes us recognize another written form of inherited
constructor, 'using Base<T>::Base::Base' whose USING_DECL_SCOPE is a
TYPENAME_TYPE.

PR c++/116276

gcc/cp/ChangeLog:

* call.cc (joust): Implement P2582R1 inherited vs non-inherited
guide tiebreaker.
* cp-tree.h (lang_decl_fn::context): Document usage in
deduction_guide_p FUNCTION_DECLs.
(inherited_guide_p): Declare.
* pt.cc (inherited_guide_p): Define.
(set_inherited_guide_context): Define.
(alias_ctad_tweaks): Use set_inherited_guide_context.
(inherited_ctad_tweaks): Recognize some inherited constructors
whose scope is a TYPENAME_TYPE.
(ctor_deduction_guides_for): For C++23 inherited CTAD, iterate
over TYPE_FIELDS instead of CLASSTYPE_CONSTRUCTORS to recognize
all inherited constructors.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/class-deduction-inherited4.C: Remove an xfail.
* g++.dg/cpp23/class-deduction-inherited5.C: New test.
* g++.dg/cpp23/class-deduction-inherited6.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P tweaks

DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P templates can only appear as part
of a template friend declaration, and in turn get partially instantiated
only from tsubst_friend_function or tsubst_friend_class. So rather than
having tsubst_template_decl clear the flag, let's leave it up to the
tsubst friend routines to clear it so that template friend handling stays
localized (note that tsubst_friend_function was already clearing it).

Also the template depth comparison test within tsubst_friend_function is
equivalent to DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P since such templates
belong to the class context (and so always have more levels than the
context), and conversely and it isn't possible to directly refer to an
existing template that has more levels than the class context.

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_class): Simplify depth comparison test
in the redeclaration code path to
DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P. Clear the flag after
partial instantiation here ...
(tsubst_template_decl): ... instead of here.

Reviewed-by: Jason Merrill <jason@redhat.com>

c++: clean up cp_identifier_kind checks

The predicates for checking an IDENTIFIER node's cp_identifier_kind
currently directly test the three flag bits that encode the kind. This
patch instead makes the checks first reconstruct the cp_identifier_kind
in its entirety and then compare that.

gcc/cp/ChangeLog:

* cp-tree.h (get_identifier_kind): Define.
(IDENTIFIER_KEYWORD_P): Redefine in terms of get_identifier_kind.
(IDENTIFIER_CDTOR_P): Likewise.
(IDENTIFIER_CTOR_P): Likewise.
(IDENTIFIER_DTOR_P): Likewise.
(IDENTIFIER_ANY_OP_P): Likewise.
(IDENTIFIER_OVL_OP_P): Likewise.
(IDENTIFIER_ASSIGN_OP_P): Likewise.
(IDENTIFIER_CONV_OP_P): Likewise.
(IDENTIFIER_TRAIT_P): Likewise.
* parser.cc (cp_lexer_peek_trait): Mark IDENTIFIER_TRAIT_P
check UNLIKELY.

Reviewed-by: Jason Merrill <jason@redhat.com>

Daily bump.

[RISC-V][PR target/116283] Fix split code for recent Zbs improvements with masked bit positions

So Patrick's fuzzer found an interesting little buglet in the Zbs improvements
I added a couple months back.

Specifically when we have masked bit position for a Zbs instruction.  If the
mask has extraneous bits set we'll generate an unrecognizable insn due to an
invalid constant.

More concretely, let's take this pattern:

> (define_insn_and_split ""
>   [(set (match_operand:DI 0 "register_operand" "=r")
>         (any_extend:DI
>          (ashift:SI (const_int 1)
>                     (subreg:QI                        (and:DI (match_operand:DI 1 "register_operand" "r")
>                               (match_operand 2 "const_int_operand")) 0))))]
What we need to know to transform this into bset for rv64.

After masking the shift count we want to know the low 5 bits aren't 0x1f.  If
they were 0x1f, then the constant generated would be 0x80000000 which would
then need sign extension out to 64bits, which the bset instruction will not do
for us.

We can ignore anything outside the low 5 bits.  The mode of the shift is SI, so
shifting by 32+ bits is undefined behavior.

It's also worth explicitly mentioning that the hardware is going to mask the
count against 0x3f.

The net is if (operands[2] & 0x1f) != 0x1f, then this transformation is safe.
So onto the generated split code...

>   [(set (match_dup 0) (and:DI (match_dup 1) (match_dup 2)))
>    (set (match_dup 0) (zero_extend:DI (ashift:SI
>                                      (const_int 1)
>                                      (subreg:QI (match_dup 0) 0))))]

Which would seemingly do exactly what we want.   The problem is the first split
insn.  If the constant does not fit into a simm12, that insn won't be
recognized resulting in the ICE.

The fix is simple, we just need to mask the constant before generating RTL.  We
can just mask it against 0x1f since we only care about the low 5 bits.

This affects multiple patterns.  I've added the appropriate fix to all of them.

Tested in my tester.  Waiting for the pre-commit bits to run before pushing.

PR target/116283
gcc/
* config/riscv/bitmanip.md (Zbs combiner patterns/splitters): Mask the
bit position in the split code appropriately.

gcc/testsuite/

* gcc.target/riscv/pr116283.c: New test

Revert "lra: emit caller-save register spills before call insn [PR116028]"

This reverts commit 3c67a0fa1dd39a3378deb854a7fef0ff7fe38004.

Adjust rangers recomputation depth based on the number of BBs.

As the number of block increase, recomputations can become more
expensive. Adjust the depth limit to avoid excessive compile time.

PR tree-optimization/114855
* gimple-range-gori.cc (gori_compute::gori_compute): Adjust
ranger_recompute_depth limit based on the number of BBs.
(gori_compute::may_recompute_p): Use previosuly calculated value.
* gimple-range-gori.h (gori_compute::m_recompute_depth): New.

Limit equivalency processing in rangers cache.

When the number of block exceed VRP's sparse threshold, do not query all
equivalencies during cache filling. This can be expensive for unknown
benefit.

PR tree-optimization/114855
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Do not
process equivalencies if the number of blocks is too high.

btf: Protect BTF_KIND_INFO against invalid kind

If the user provides a kind value that is more than 5 bits, the
BTF_KIND_INFO macro would emit incorrect values for info (by clobbering
values of the kind flag).

Tested on x86_64-redhat-linux.

include/ChangeLog:

* btf.h (BTF_TYPE_INFO): Protect against user providing invalid
kind.

Signed-off-by: Will Hawkins <hawkinsw@obs.cr>

c++: Don't accept multiple enum definitions within template class [PR115806]

We have been accepting the following invalid code since revision 557831a91df

=== cut here ===
template <typename T> struct S {
enum E { a };
enum E { b };
};
S<int> s;
=== cut here ===

The problem is that start_enum will set OPAQUE_ENUM_P to true even if it
retrieves an existing definition for the enum, which causes the redefinition
check in cp_parser_enum_specifier to be bypassed.

This patch only sets OPAQUE_ENUM_P and ENUM_FIXED_UNDERLYING_TYPE_P when
actually pushing a new tag for the enum.

PR c++/115806

gcc/cp/ChangeLog:

* decl.cc (start_enum): Only set OPAQUE_ENUM_P and
ENUM_FIXED_UNDERLYING_TYPE_P when pushing a new tag.

gcc/testsuite/ChangeLog:

* g++.dg/parse/enum15.C: New test.

RISC-V: Enable stack clash in alloca

Add the TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE to riscv in
order to enable stack clash protection when using alloca.
The code and tests are the same used by aarch64.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_compute_frame_info): Update
outgoing args size.
(riscv_stack_clash_protection_alloca_probe_range): New.
(TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New.
* config/riscv/riscv.h
(STACK_CLASH_MIN_BYTES_OUTGOING_ARGS): New.
(STACK_DYNAMIC_OFFSET): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack-check-14.c: New test.
* gcc.target/riscv/stack-check-15.c: New test.
* gcc.target/riscv/stack-check-alloca-1.c: New test.
* gcc.target/riscv/stack-check-alloca-2.c: New test.
* gcc.target/riscv/stack-check-alloca-3.c: New test.
* gcc.target/riscv/stack-check-alloca-4.c: New test.
* gcc.target/riscv/stack-check-alloca-5.c: New test.
* gcc.target/riscv/stack-check-alloca-6.c: New test.
* gcc.target/riscv/stack-check-alloca-7.c: New test.
* gcc.target/riscv/stack-check-alloca-8.c: New test.
* gcc.target/riscv/stack-check-alloca-9.c: New test.
* gcc.target/riscv/stack-check-alloca-10.c: New test.
* gcc.target/riscv/stack-check-alloca.h: New.

RISC-V: Add support to vector stack-clash protection

Adds basic support to vector stack-clash protection using a loop to do
the probing and stack adjustments.

gcc/ChangeLog:
* config/riscv/riscv.cc
(riscv_allocate_and_probe_stack_loop): New function.
(riscv_v_adjust_scalable_frame): Add stack-clash protection
support.
(riscv_allocate_and_probe_stack_space): Move the probe loop
implementation to riscv_allocate_and_probe_stack_loop.
* config/riscv/riscv.h: Define RISCV_STACK_CLASH_VECTOR_CFA_REGNUM.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack-check-cfa-3.c: New test.
* gcc.target/riscv/stack-check-prologue-16.c: New test.
* gcc.target/riscv/struct_vect_24.c: New test.

RISC-V: Stack-clash protection implemention

This implements stack-clash protection for riscv, with
riscv_allocate_and_probe_stack_space being based of
aarch64_allocate_and_probe_stack_space from aarch64's implementation.
We enforce the probing interval and the guard size to always be equal, their
default value is 4Kb which is riscv page size.

We also probe up by 1024 bytes in the general case when a probe is required.

gcc/ChangeLog:
* config/riscv/riscv.cc
(riscv_option_override): Enforce that interval is the same size as
guard size.
(riscv_allocate_and_probe_stack_space): New function.
(riscv_expand_prologue): Call riscv_allocate_and_probe_stack_space
to the final allocation of the stack and add stack-clash dump
information.
* config/riscv/riscv.h: Define STACK_CLASH_CALLER_GUARD and
STACK_CLASH_MAX_UNROLL_PAGES.

gcc/testsuite/ChangeLog:
* gcc.dg/params/blocksort-part.c: Skip riscv for
stack-clash protection intervals.
* gcc.dg/pr82788.c: Skip riscv.
* gcc.dg/stack-check-6.c: Skip residual check for riscv.
* gcc.dg/stack-check-6a.c: Skip riscv.
* gcc.target/riscv/stack-check-12.c: New test.
* gcc.target/riscv/stack-check-13.c: New test.
* gcc.target/riscv/stack-check-cfa-1.c: New test.
* gcc.target/riscv/stack-check-cfa-2.c: New test.
* gcc.target/riscv/stack-check-prologue-1.c: New test.
* gcc.target/riscv/stack-check-prologue-10.c: New test.
* gcc.target/riscv/stack-check-prologue-11.c: New test.
* gcc.target/riscv/stack-check-prologue-12.c: New test.
* gcc.target/riscv/stack-check-prologue-13.c: New test.
* gcc.target/riscv/stack-check-prologue-14.c: New test.
* gcc.target/riscv/stack-check-prologue-15.c: New test.
* gcc.target/riscv/stack-check-prologue-2.c: New test.
* gcc.target/riscv/stack-check-prologue-3.c: New test.
* gcc.target/riscv/stack-check-prologue-4.c: New test.
* gcc.target/riscv/stack-check-prologue-5.c: New test.
* gcc.target/riscv/stack-check-prologue-6.c: New test.
* gcc.target/riscv/stack-check-prologue-7.c: New test.
* gcc.target/riscv/stack-check-prologue-8.c: New test.
* gcc.target/riscv/stack-check-prologue-9.c: New test.
* gcc.target/riscv/stack-check-prologue.h: New file.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection):
Add riscv.
(check_effective_target_caller_implicit_probes): Likewise.

RISC-V: Move riscv_v_adjust_scalable_frame

Move riscv_v_adjust_scalable_frame () in preparation for the stack clash
protection support.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_v_adjust_scalable_frame): Move
closer to riscv_expand_prologue.

RISC-V: Small stack tie changes

Enable the register used by riscv_emit_stack_tie () to be passed as
an argument so we can tie the stack with other registers besides
hard_frame_pointer_rtx.
Also don't allow operand 1 of stack_tie<mode> to be optimized to sp
in preparation for the stack clash protection support.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_emit_stack_tie): Pass the
register to be tied to the stack pointer as argument.
* config/riscv/riscv.md (stack_tie<mode>): Don't match equal
operands.

c-family: regenerate c.opt.urls

The addition of -Wtemplate-body in r15-2774-g596d1ed9d40b10 means
we need to regenerate c.opt.urls.

gcc/c-family/ChangeLog:

* c.opt.urls: Regenerate.

c++: add fixed testcase [PR116289]

Fully fixed since r14-6724-gfced59166f95e9.

PR c++/116289
PR c++/113063

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/spaceship-synth16a.C: New test.

i386: Fix up __builtin_ia32_b{extr{,i}_u{32,64},zhi_{s,d}i} folding [PR116287]

The GENERIC folding of these builtins have cases where it folds to a
constant regardless of the value of the first operand.  If so, we need
to use omit_one_operand to avoid throwing away side-effects in the first
operand if any.  The cases which verify the first argument is INTEGER_CST
don't need that, INTEGER_CST doesn't have side-effects.

2024-08-09  Jakub Jelinek  <jakub@redhat.com>

PR target/116287
* config/i386/i386.cc (ix86_fold_builtin) <case IX86_BUILTIN_BEXTR32>:
When folding into zero without checking whether first argument is
constant, use omit_one_operand.
(ix86_fold_builtin) <case IX86_BUILTIN_BZHI32>: Likewise.

* gcc.target/i386/bmi-pr116287.c: New test.
* gcc.target/i386/bmi2-pr116287.c: New test.
* gcc.target/i386/tbm-pr116287.c: New test.

amdgcn: Add padding to trampoline

This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c).

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align.
* config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40.

OpenMP: Constructors and destructors for "declare target" static aggregates: Fix effective-target keyword in test cases

(Most of) the tests added in commit f1bfba3a9b3f31e3e06bfd1911c9f223869ea03f
"OpenMP: Constructors and destructors for "declare target" static aggregates"
had a mismatch between dump file production and its scanning; the former needs
to use 'offload_target_nvptx' (like 'offload_target_amdgcn'), not
'offload_device_nvptx'.

libgomp/
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C:
Fix effective-target keyword.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C:
Likewise.
* testsuite/libgomp.c-c++-common/target-is-initial-host-2.c:
Likewise.
* testsuite/libgomp.c-c++-common/target-is-initial-host.c:
Likewise.
* testsuite/libgomp.fortran/target-is-initial-host-2.f90:
Likewise.
* testsuite/libgomp.fortran/target-is-initial-host.f: Likewise.
* testsuite/libgomp.fortran/target-is-initial-host.f90: Likewise.

AVR: Tidy up code for __[x]load insns.

gcc/
* config/avr/avr.md (*load_<mode>_libgcc, *xload_<mode>_libgcc):
Tidy up code.

c-family: Add some more ARRAY_SIZE uses

These two spots were just non-standard, because they divided
sizeof (omp_pragmas_simd) by sizeof (*omp_pragmas) and not
the expected sizeof (*omp_pragmas_simd) and so weren't converted
into ARRAY_SIZE. Both of the latter sizes are the same though,
as both arrays have the same type, so this patch doesn't change
anything but readability.

2024-08-09 Jakub Jelinek <jakub@redhat.com>

* c-pragma.cc (c_pp_lookup_pragma): Use ARRAY_SIZE in
n_omp_pragmas_simd initializer.
(init_pragmas): Likewise.

aarch64: Check CONSTM1_RTX in definition of Dm constraint

The constraint Dm is intended to match vectors of minus 1, but actually
checks for CONST1_RTX. This doesn't have a bad effect in practice as its
only use in the aarch64_wrffr pattern for the setffr instruction which
is a VNx16BI operation and -1 and 1 are the same there. That pattern
can only be currently generated through intrinsics anyway that create it
with a CONSTM1_RTX constant.

Fix the constraint definition so that it doesn't become a footgun if its
used in some other pattern.

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:

* config/aarch64/constraints.md (Dm): Match CONSTM1_RTX rather
CONST1_RTX.

Daily bump.

aarch64/testsuite: Fix if-compare_2.c for removing vcond{,u,eq} patterns [PR116041]

For bar1 and bar2, we currently is expecting to use the bsl instruction but
with slightly different register allocation inside the loop (which happens after
the removal of the vcond{,u,eq} patterns), we get the bit instruction. The pattern that
outputs bsl instruction will output bit and bif too depending register allocation.

So let's check for bsl, bit or bif instructions instead of just bsl instruction.

Tested on aarch64 both with an unmodified compiler and one which has the patch to disable
these optabs.

gcc/testsuite/ChangeLog:

PR testsuite/116041
* gcc.target/aarch64/if-compare_2.c: Support bit and bif for
both bar1 and bar2; add comment on why too.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

AArch64: Fix signbit mask creation after late combine [PR116229]

The optimization to generate a Di signbit constant by using fneg was relying
on nothing being able to push the constant into the negate. It's run quite
late for this reason.

However late combine now runs after it and triggers RTL simplification based on
the neg. When -fno-signed-zeros this ends up dropping the - from the -0.0 and
thus producing incorrect code.

This change adds a new unspec FNEG on DI mode which prevents this simplication.

gcc/ChangeLog:

PR target/116229
* config/aarch64/aarch64-simd.md (aarch64_fnegv2di2<vczle><vczbe>): New.
* config/aarch64/aarch64.cc (aarch64_maybe_generate_simd_constant):
Update call to gen_aarch64_fnegv2di2.
* config/aarch64/iterators.md: New UNSPEC_FNEG.

gcc/testsuite/ChangeLog:

PR target/116229
* gcc.target/aarch64/pr116229.c: New test.

AVR: target/116295 - Fix unrecognizable insn with __flash read.

Some loads from non-generic address-spaces are performed by
libgcc calls, and they don't have a POST_INC form. Don't consider
such insns when running -mfuse-add.

PR target/116295
gcc/
* config/avr/avr.cc (Mem_Insn::Mem_Insn): Don't consider MEMs
that are avr_mem_memx_p or avr_load_libgcc_p.

gcc/testsuite/
* gcc.target/avr/torture/pr116295.c: New test.

AVR: Fix a typo in __builtin_avr_mask1 documentation.

gcc/
* doc/extend.texi (AVR Built-in Functions) <mask1>: Fix a typo.

AVR: Improve POST_INC output in some rare cases.

gcc/
* config/avr/avr.cc (avr_insn_has_reg_unused_note_p): New function.
(_reg_unused_after): Use it to recognize more cases.
(avr_out_lpm_no_lpmx) [POST_INC]: Use reg_unused_after.

amdgcn: Fix VGPR max count

The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the
maximum usable number of registers is 252. This patch prevents the compiler
from exceeding this artifical limit.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers
remaining after maximum allocation using TARGET_VGPR_GRANULARITY.

libgomp.texi: Update implementation status table for OpenMP TR13

libgomp/ChangeLog:

* libgomp.texi (OpenMP Technical Report 13): Renamed from
'OpenMP Technical Report 12'; updated for TR13 changes.

ada: Missing legality check when type completed

An access discriminant is allowed to have a default value only if the
discriminated type is immutably limited. In the case of a discriminated
limited private type declaration, this rule needs to be checked when
the completion of the type is seen.

gcc/ada/

* sem_ch6.adb (Check_Discriminant_Conformance): Perform check for
illegal access discriminant default values when the completion of
a limited private type is analyzed.
* sem_aux.adb (Is_Immutably_Limited): If passed the
not-yet-analyzed entity for the full view of a record type, test
the Limited_Present flag
(which is set by the parser).

ada: Etype missing for raise expression

If the primitive equality operator of the component type of an array type is
abstract, then a call to that abstract function raises Program_Error (when
such a call is legal). The FE generates a raise expression to implement this.
That raise expression is an expression so it should have a valid Etype.

gcc/ada/

* exp_ch4.adb (Build_Eq_Call): In the abstract callee case, copy
the Etype of the callee onto the Make_Raise_Program_Error result.

ada: Run-time error with GNAT-LLVM on container aggregate with finalization

When unnesting is enabled, the compiler was failing to copy the At_End_Proc
field from a block statement to the procedure created to replace it when
unnesting of top-level blocks is done. At run time this could lead to
exceptions due to missing finalization calls.

gcc/ada/

* exp_ch7.adb (Unnest_Block): Copy the At_End_Proc from the block
statement to the newly created subprogram body.

ada: Futher refinements to mutably tagged types

This patch further enhances the mutably tagged type implementation by fixing
several oversights relating to generic instantiations, attributes, and
type conversions.

gcc/ada/

* exp_put_image.adb (Append_Component_Attr): Obtain the mutably
tagged type for the component type.
* mutably_tagged.adb (Make_Mutably_Tagged_Conversion): Add more
cases to avoid conversion generation.
* sem_attr.adb (Check_Put_Image_Attribute): Add mutably tagged
type conversion.
* sem_ch12.adb (Analyze_One_Association): Add rewrite for formal
type declarations which are mutably tagged type to their
equivalent type.
(Instantiate_Type): Add condition to obtain class wide equivalent
types.
(Validate_Private_Type_Instance): Add check for class wide
equivalent types which are considered "definite".
* sem_util.adb (Is_Variable): Add condition to handle selected
components of view conversions. Add missing check for selected
components.
(Is_View_Conversion): Add condition to handle class wide
equivalent types.

ada: Spurious maximum nesting level warnings

This patch fixes an issue in the compiler whereby disabling style checks via
pragma Style_Checks ("-L") resulted in the minimum nesting level being zero
but the style still being enabled - leading to spurious maximum nesting level
exceeded warnings.

gcc/ada/

* stylesw.adb (Set_Style_Check_Options): Disable max nesting level
when unspecified

ada: Finalization_Size raises Constraint_Error

When the attribute Finalization_Size is applied to an interface type
object, the compiler-generated code fails at runtime, raising a
Constraint_Error exception.

gcc/ada/

* exp_attr.adb (Expand_N_Attribute_Reference) <Finalization_Size>:
If the prefix is an interface type, generate code to obtain its
address and displace it to reference the base of the object.

RISC-V: rv32/DF: Prevent 2 SImode loads using XTheadMemIdx

When enabling XTheadFmv/Zfa and XThead(F)MemIdx, we might end up
with the following insn (registers are examples, but of correct class):

(set (reg:DF a4)
     (mem:DF (plus:SI (mult:SI (reg:SI a0)
       (const_int 8))
      (reg:SI a5))))

This is a result of an attempt to load the DF register via two SI
register loads followed by XTheadFmv/Zfa instructions to move the
contents of the two SI registers into the DF register.

The two loads are generated in riscv_split_doubleword_move(),
where the second load adds an offset of 4 to load address.
While this works fine for RVI loads, this can't be handled
for XTheadMemIdx addresses.  Coming back to the example above,
we would end up with the following insn, which can't be simplified
or matched:

(set (reg:SI a4)
     (mem:SI (plus:SI (plus:SI (mult:SI (reg:SI a0)
(const_int 8))
       (reg:SI a5))
      (const_int 4))))

This triggered an ICE in the past, which was resolved in b79cd204c780,
which also added the test xtheadfmemidx-medany.c, where the examples
are from.  The patch postponed the optimization insn_and_split pattern
for XThead(F)MemIdx, so that the situation could effectively be avoided.

Since we don't want to rely on these optimization pattern in the future,
we need a different solution.  Therefore, this patch restricts the
movdf_hardfloat_rv32 insn to not match for split-double-word-moves
with XThead(F)MemIdx operands.  This ensures we don't need to split
them up later.

When looking at the code generation of the test file, we can see that
we have less GP<->FP conversions, but cannot use the indexed loads.
The new sequence is identical to rv32gc_xtheadfmv (similar to rv32gc_zfa).

Old:
[...]
lla     a5,.LANCHOR0
th.flrd fa5,a5,a0,3
fmv.x.w a4,fa5
th.fmv.x.hw     a5,fa5
.L1:
fmv.w.x fa0,a4
th.fmv.hw.x     fa0,a5
ret
[...]

New:
[...]
lla     a5,.LANCHOR0
slli    a4,a0,3
add     a4,a4,a5
lw      a5,4(a4)
lw      a4,0(a4)
.L1:
fmv.w.x fa0,a4
th.fmv.hw.x     fa0,a5
ret
[...]

This was tested (together with the patch that eliminates the
XTheadMemIdx optimization patterns) with SPEC CPU 2017 intrate
on QEMU (RV64/lp64d).

gcc/ChangeLog:

* config/riscv/constraints.md (th_m_noi): New constraint.
* config/riscv/riscv.md: Adjust movdf_hardfloat_rv32 for
XTheadMemIdx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: Adjust.
* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: xthead(f)memidx: Eliminate optimization patterns

We have a huge amount of optimization patterns (insn_and_split) for
XTheadMemIdx and XTheadFMemIdx that attempt to do something, that can be
done more efficient by generic GCC passes, if we have proper support code.

A key function in eliminating the optimization patterns is
th_memidx_classify_address_index(), which needs to identify each possible
memory expression that can be lowered into a XTheadMemIdx/XTheadFMemIdx
instruction. This patch adds all memory expressions that were
previously only recognized by the optimization patterns.

Now, that the address classification is complete, we can finally remove
all optimization patterns with the side-effect or getting rid of the
non-canonical memory expression they produced: (plus (reg) (ashift (reg) (imm))).

A positive side-effect of this change is, that we address an RV32 ICE,
that was caused by the th_memidx_I_c pattern, which did not properly
handle SUBREGs (more details are in PR116131).

A temporary negative side-effect of this change is, that we cause a
regression of the xtheadfmemidx + xtheadfmv/zfa tests (initially
introduced as part of b79cd204c780 to address an ICE).
As this issue cannot be addressed in the code parts that are
adjusted in this patch, we just accept the regression for now.

PR target/116131

gcc/ChangeLog:

* config/riscv/thead.cc (th_memidx_classify_address_index):
Recognize all possible XTheadMemIdx memory operand structures.
(th_fmemidx_output_index): Do strict classification.
* config/riscv/thead.md (*th_memidx_operand): Remove.
(TARGET_XTHEADMEMIDX): Likewise.
(TARGET_HARD_FLOAT && TARGET_XTHEADFMEMIDX): Likewise.
(!TARGET_64BIT && TARGET_XTHEADMEMIDX): Likewise.
(*th_memidx_I_a): Likewise.
(*th_memidx_I_b): Likewise.
(*th_memidx_I_c): Likewise.
(*th_memidx_US_a): Likewise.
(*th_memidx_US_b): Likewise.
(*th_memidx_US_c): Likewise.
(*th_memidx_UZ_a): Likewise.
(*th_memidx_UZ_b): Likewise.
(*th_memidx_UZ_c): Likewise.
(*th_fmemidx_movsf_hardfloat): Likewise.
(*th_fmemidx_movdf_hardfloat_rv64): Likewise.
(*th_fmemidx_I_a): Likewise.
(*th_fmemidx_I_c): Likewise.
(*th_fmemidx_US_a): Likewise.
(*th_fmemidx_US_c): Likewise.
(*th_fmemidx_UZ_a): Likewise.
(*th_fmemidx_UZ_c): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116131.c: New test.

Reported-by: Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: testsuite: xtheadfmemidx: Rename test and add similar Zfa test

Test file xtheadfmemidx-medany.c has been added in b79cd204c780 as a
test case that provoked an ICE when loading DFmode registers via two
SImode register loads followed by a SI->DF[63:32] move from XTheadFmv.
Since Zfa is affected in the same way as XTheadFmv, even if both
have slightly different instructions, let's add a test for Zfa as well
and give the tests proper names.

Let's also add a test into the test files that counts the SI->DF moves
from XTheadFmv/Zfa.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-medany.c: Move to...
* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: ...here.
* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

vect: Small C++11-ification of vect_vect_recog_func_ptrs

This is a small C++11-ificiation for the use of vect_vect_recog_func_ptrs.
Changes the loop into a range based loop which then we can remove the variable
definition of NUM_PATTERNS. Also uses const reference instead of a pointer.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-vect-patterns.cc (NUM_PATTERNS): Delete.
(vect_pattern_recog_1): Constify and change
recog_func to a reference.
(vect_pattern_recog): Use range-based loop over
vect_vect_recog_func_ptrs.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

RISC-V: Delete duplicate '#define RISCV_DWARF_VLENB'

gcc/ChangeLog:

* config/riscv/riscv.h (RISCV_DWARF_VLENB): Delete.

amdgcn: Re-enable trampolines

The stacks are executable since the reverse-offload features were added, so
trampolines actually do work.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines.

[RISC-V][PR target/116240] Ensure object is a comparison before extracting arguments

This was supposed to go out the door yesterday, but I kept getting interrupted.

The target bits for rtx costing can't assume the rtl they're given actually
matches a target pattern.   It's just kind of inherent in how the costing
routines get called in various places.

In this particular case we're trying to cost a conditional move:

(set (dest) (if_then_else (cond) (true) (false))

On the RISC-V port the backend only allows actual conditionals for COND.  So
something like (eq (reg) (const_int 0)).  In the costing code for if-then-else
we did something like

(XEXP (XEXP (cond, 0), 0)))

Which fails miserably if COND is a terminal node like (reg) rather than (ne
(reg) (const_int 0)

So this patch tightens up the RTL scanning to ensure that we have a comparison
before we start looking at the comparison's arguments.

Run through my tester without incident, but I'll wait for the pre-commit tester
to run through a cycle before pushing to the trunk.

Jeff

ps.   We probably could support a naked REG for the condition and internally convert it to (ne (reg) (const_int 0)), but I don't think it likely happens with any regularity.

PR target/116240
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Ensure object is a
comparison before looking at its arguments.

gcc/testsuite
* gcc.target/riscv/pr116240.c: New test.

Rearrange SLP nodes with duplicate statements [PR98138]

This change checks when a two_operators SLP node has multiple occurrences of
the same statement (e.g. {A, B, A, B, ...}) and tries to rearrange the operands
so that there are no duplicates. Two vec_perm expressions are then introduced
to recreate the original ordering. These duplicates can appear due to how
two_operators nodes are handled, and they prevent vectorization in some cases.

This targets the vectorization of the SPEC2017 x264 pixel_satd functions.
In some processors a larger than 10% improvement on x264 has been observed.

PR tree-optimization/98138

gcc/ChangeLog:

* tree-vect-slp.cc: Avoid duplicates in two_operators nodes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-slp-two-operator.c: New test.

c++: Propagate TREE_ADDRESSABLE in fixup_type_variants [PR115062]

This has caused issues with modules when an import fills in the
definition of a type already created with a typedef.

PR c++/115062

gcc/cp/ChangeLog:

* class.cc (fixup_type_variants): Propagate TREE_ADDRESSABLE.
(finish_struct_bits): Cleanup now that TREE_ADDRESSABLE is
propagated by fixup_type_variants.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr115062_a.H: New test.
* g++.dg/modules/pr115062_b.H: New test.
* g++.dg/modules/pr115062_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Assume header bindings are global module

While stepping through some code I noticed that we do some extra work
(finding the originating module decl, stripping the template, and
inspecting the attached-ness) for every declaration taken from a header
unit. This doesn't seem necessary though since no declaration in a
header unit can be attached to anything but the global module, so we can
just assume that global_p will be true.

This was the original behaviour before I removed this assumption while
refactoring for r15-2807-gc592310d5275e0.

gcc/cp/ChangeLog:

* module.cc (module_state::read_cluster): Assume header module
declarations will require GM merging.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

libgomp/libgomp.texi: Mention -fno-builtin-omp_is_initial_device

libgomp/ChangeLog:

* libgomp.texi (omp_is_initial_device): Mention
-fno-builtin-omp_is_initial_device and folding by default.

i386: Tweak ix86_mode_can_transfer_bits to restore bootstrap on RHEL.

This minor patch, very similar to one posted and approved previously at
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657229.html is
required to restore builds on systems using gcc 4.8 as a host compiler.
Using the enumeration constants E_SFmode and E_DFmode avoids issues with
SFmode and DFmode being "non-literal types in constant expressions".

2024-08-08 Roger Sayle <roger@nextmovesoftware.com>

gcc/ChangeLog
* config/i386/i386.cc (ix86_mode_can_transfer_bits): Use E_?Fmode
enumeration constants in switch statement.

c++, libstdc++: Implement C++26 P2747R2 - constexpr placement new [PR115744]

With the PR115754 fix in, constexpr placement new mostly just works,
so this patch just adds constexpr keyword to the placement new operators
in <new>, adds FTMs and testsuite coverage.

There is one accepts-invalid though, the
new (p + 1) int[]{2, 3};      // error (in this paper)
case from the paper.  Can we handle that incrementally?
The problem with that is I think calling operator new now that it is
constexpr should be fine even in that case in constant expressions, so
int *p = std::allocator<int>{}.allocate(3);
int *q = operator new[] (sizeof (int) * 2, p + 1);
should be ok, so it can't be easily the placement new operator call
itself on whose constexpr evaluation we try something special, it should
be on the new expression, but constexpr.cc actually sees only
<<< Unknown tree: expr_stmt
  (void) (TARGET_EXPR <D.2640, (void *) TARGET_EXPR <D.2641, VIEW_CONVERT_EXPR<int *>(b) + 4>>, TARGET_EXPR <D.2642, operator new [] (8, NON_LVALUE_EXPR <D.2640>)>,   int * D.2643;
  <<< Unknown tree: expr_stmt
    (void) (D.2643 = (int *) D.2642) >>>;
and that is just fine by the preexisting constexpr evaluation rules.

Should build_new_1 emit some extra cast for the array cases with placement
new in maybe_constexpr_fn (current_function_decl) that the existing P2738
code would catch?

2024-08-08  Jakub Jelinek  <jakub@redhat.com>

PR c++/115744
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_constexpr
from 202306L to 202406L for C++26.
gcc/testsuite/
* g++.dg/cpp2a/construct_at.h (operator new, operator new[]):
Use constexpr instead of inline if __cpp_constexpr >= 202406L.
* g++.dg/cpp26/constexpr-new1.C: New test.
* g++.dg/cpp26/constexpr-new2.C: New test.
* g++.dg/cpp26/constexpr-new3.C: New test.
* g++.dg/cpp26/feat-cxx26.C (__cpp_constexpr): Adjust expected
value.
libstdc++-v3/
* libsupc++/new (__glibcxx_want_constexpr_new): Define before
including bits/version.h.
(_GLIBCXX_PLACEMENT_CONSTEXPR): Define.
(operator new, operator new[]): Use it for placement new instead
of inline.
* include/bits/version.def (constexpr_new): New FTM.
* include/bits/version.h: Regenerate.

libgomp.c++/static-aggr-constructor-destructor-{1,2}.C: Fix scan-tree-dump

In principle, the optimized dump should be the same on the host, but as
'nohost' is not handled, is is present. However when ENABLE_OFFLOADING is
false, it is handled early enough to remove the function.

libgomp/ChangeLog:

* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: Split
scan-tree-dump into with and without target offload_target_any.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C:
Likewise.

Ada, libgnarl: Fix s-taprop__posix.adb compilation.

Bootstrap on Darwin, and likely any other targets using the posix
implementation of s-taprop was broken by commits between r15-2743
and r15-2747:
s-taprop.adb:297:15: error: "size_t" is not visible
s-taprop.adb:297:15: error: multiple use clauses cause hiding
s-taprop.adb:297:15: error: hidden declaration at s-osinte.ads:58
s-taprop.adb:297:15: error: hidden declaration at i-c.ads:9

This seems to be caused by an omitted change to use Interfaces.C.size_t
instead of just size_t. Fixed thus.

gcc/ada/ChangeLog:

* libgnarl/s-taprop__posix.adb (Stack_Guard): Use Interfaces.C.size_t
for the type of Page_Size.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

ada: Fix s-taprop__solaris.adb compilation

Solaris Ada bootstrap is broken as of 2024-08-06 with

s-taprop.adb:1971:23: error: "int" is not visible
s-taprop.adb:1971:23: error: multiple use clauses cause hiding
s-taprop.adb:1971:23: error: hidden declaration at s-osinte.ads:51
s-taprop.adb:1971:23: error: hidden declaration at i-c.ads:62

because one instance of int isn't qualified. This patch fixes this.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

2024-08-07 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>

gcc/ada:
* libgnarl/s-taprop__solaris.adb (Set_Task_Affinity): Fully
quality int.

tree-optimization/116258 - fix i386 testcase

With -march=cascadelake we use vpermilps instead of shufps.

PR tree-optimization/116258
* gcc.target/i386/pr116258.c: Also allow vpermilps.

lra: emit caller-save register spills before call insn [PR116028]

LRA emits insns to save caller-save registers in the
inheritance/splitting pass. In this pass, LRA builds EBBs (Extended
Basic Block) and traverses the insns in the EBBs in reverse order from
the last insn to the first insn. When LRA sees a write to a pseudo (that
has been assigned a caller-save register), and there is a read following
the write, with an intervening call insn between the write and read,
then LRA generates a spill immediately after the write and a restore
immediately before the read. The spill is needed because the call insn
will clobber the caller-save register.

If there is a write insn and a call insn in two separate BBs but
belonging to the same EBB, the spill insn gets generated in the BB
containing the write insn. If the write insn is in the entry BB, then
the spill insn that is generated in the entry BB prevents shrink wrap
from happening. This is because the spill insn references the stack
pointer and hence the prolog gets generated in the entry BB itself.

This patch ensures the the spill insn is generated before the call insn
instead of after the write. This also ensures that the spill occurs
only in the path containing the call.

2024-08-01 Surya Kumari Jangala <jskumari@linux.ibm.com>

gcc:
PR rtl-optimization/116028
* lra-constraints.cc (split_reg): Spill register before call
insn.
(latest_call_insn): New variable.
(inherit_in_ebb): Track the latest call insn.

gcc/testsuite:
PR rtl-optimization/116028
* gcc.dg/ira-shrinkwrap-prep-1.c: Remove xfail for powerpc.
* gcc.dg/pr10474.c: Remove xfail for powerpc.

RISC-V: Minimal support for Zimop extension.

This patch support Zimop and Zcmop extension[1].To enable GCC to recognize
and process Zimop and Zcmop extension correctly at compile time.

https://github.com/riscv/riscv-isa-manual/blob/main/src/zimop.adoc

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: New extension.
* config/riscv/riscv.opt: New mask.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-42.c: New test.
* gcc.target/riscv/arch-43.c: New test.

c++/modules: Handle instantiating already tsubsted template friend classes [PR115801]

With modules it may be the case that a template friend class provided
with a qualified name is not found by name lookup at instantiation time,
due to the class not being exported from its module.  This causes issues
in tsubst_friend_class which did not handle this case.

This is caused by the named friend class not actually requiring
tsubsting.  This was already worked around for the "found by name
lookup" case (g++.dg/template/friend5.C), but it looks like there's no
need to do name lookup at all for this particular case to work.

We do need to be careful to continue to do name lookup to handle
templates from an outer current instantiation though; this patch adds a
new testcase for this as well.  This should not impact modules (because
exportingness will only affect namespace lookup).

PR c++/115801

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_class): Return the type immediately when
no tsubsting or name lookup is required.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-16_a.C: New test.
* g++.dg/modules/tpl-friend-16_b.C: New test.
* g++.dg/template/friend82.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Fix merging of GM entities in partitions [PR114950]

Currently name lookup generally seems to assume that all entities
declared within a named module (partition) are attached to said module,
which is not true for GM entities (e.g. via extern "C++"), and causes
issues with deduplication.

This patch fixes the issue by ensuring that module attachment of a
declaration is consistently used to handling merging. Handling this
exposes some issues with deduplicating temploid friends; to resolve this
we always create the BINDING_SLOT_PARTITION slot so that we have
somewhere to place attached names (from any module).

This doesn't yet completely handle issues with allowing otherwise
conflicting temploid friends from different modules to co-exist in the
same module if neither are reachable from the other via name lookup.

PR c++/114950

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Stream bit indicating
imported temploid friends early.
(trees_in::decl_value): Use this bit with key_mergeable.
(trees_in::key_mergeable): Allow merging attached declarations
if they're imported temploid friends (which must be namespace
scope).
(module_state::read_cluster): Check for GM entities that may
require merging even when importing from partitions.
* name-lookup.cc (enum binding_slots): Adjust comment.
(get_fixed_binding_slot): Always create partition slot.
(name_lookup::search_namespace_only): Support binding vectors
with both partition and GM entities to dedup.
(walk_module_binding): Likewise.
(name_lookup::adl_namespace_fns): Likewise.
(set_module_binding): Likewise.
(check_module_override): Use attachment of the decl when
checking overrides rather than named_module_p.
(lookup_imported_hidden_friend): Use partition slot for finding
mergeable template bindings.
* name-lookup.h (set_module_binding): Split mod_glob_flag
parameter into separate global_p and partition_p params.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-13_e.C: Adjust error message.
* g++.dg/modules/ambig-2_a.C: New test.
* g++.dg/modules/ambig-2_b.C: New test.
* g++.dg/modules/part-9_a.C: New test.
* g++.dg/modules/part-9_b.C: New test.
* g++.dg/modules/part-9_c.C: New test.
* g++.dg/modules/tpl-friend-15.h: New test.
* g++.dg/modules/tpl-friend-15_a.C: New test.
* g++.dg/modules/tpl-friend-15_b.C: New test.
* g++.dg/modules/tpl-friend-15_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

c++/modules: Clarify error message in read_enum_def

This error message reads to me the wrong way around, particularly in the
context of other errors. Updated so that the ellipsis connect.

gcc/cp/ChangeLog:

* module.cc (trees_in::read_enum_def): Clarify error.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-bad-1_b.C: Update error message.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Daily bump.

compiler: don't assume that ATTRIBUTE_UNUSED is defined

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/604075

Darwin: Recognise -weak_framework in the driver [PR116237].

XCode compilers recognise the weak_framework linker option in the driver
and forward it. This patch makes GCC adopt the same behaviour.

PR target/116237

gcc/ChangeLog:

* config/darwin.h (SUBTARGET_DRIVER_SELF_SPECS): Add a spec for
weak_framework.
* config/darwin.opt: Handle weak_framework driver option.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++: erroneous partial spec vs primary tmpl [PR116064]

When a partial specialization is deemed erroneous at parse time, we
currently flag the primary template as erroneous instead. Later
at instantiation time we check if the primary template is erroneous
rather than the selected partial specialization, so at least we're
consistent.

But it's better not to conflate a partial specialization with the
primary template since they're instantiated independenty. This avoids
rejecting the instantiation of A<int> in the below testcase.

PR c++/116064

gcc/cp/ChangeLog:

* error.cc (get_current_template): If the current scope is
a partial specialization, return it instead of the primary
template.
* pt.cc (instantiate_class_template): Pass the partial
specialization if any to maybe_diagnose_erroneous_template
instead of the primary template.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error2.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Partially support streaming of poly_int for offloading.

When offloading is enabled, the patch streams out host
NUM_POLY_INT_COEFFS, and changes streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
    poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
    poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */
  for (; i < host_num_poly_int_coeffs; i++)
    {
      val = stream_in coeff;
      if (val != 0)
error ();
    }
}

gcc/ChangeLog:
PR ipa/96265
PR ipa/111937
* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
streaming, and call poly_int_read_common instead.
(streamer_read_poly_int64): Likewise.
* data-streamer.cc (host_num_poly_int_coeffs): Conditionally define
new variable if ACCEL_COMPILER is defined.
* data-streamer.h (host_num_poly_int_coeffs): Declare.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Remove code for streaming and call
poly_int_read_common instead.
* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER
is defined.
* lto-streamer-out.cc (lto_write_mode_table): Stream out
NUM_POLY_INT_COEFFS if offloading is enabled.
* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

Don't call clean_symbol_name in create_tmp_var_name [PR116219]

SRA adds fancy names like offset$D94316$_M_impl$D93629$_M_start
where the numbers in there are DECL_UIDs if there are unnamed
FIELD_DECLs etc.
Because -g0 vs. -g can cause differences between the exact DECL_UID
values (add bigger gaps in between them, corresponding decls should
still be ordered the same based on DECL_UID) we make sure such
decls have DECL_NAMELESS set and depending on exact options either don't
dump such names at all or dump_fancy_name sanitizes the D123456$ parts in
there to Dxxxx$.
Unfortunately in tons of places we then use get_name to grab either user
names or these SRA created names and use that as argument to
create_tmp_var{,_name,_raw} to base other artificial temporary names based
on that.  Those are DECL_NAMELESS too, but unfortunately create_tmp_var_name
starting with
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=725494f6e4121eace43b7db1202f8ecbf52a8276
calls clean_symbol_name which replaces the $s in there with _s and thus
dump_fancy_name doesn't sanitize it anymore.

I don't see any discussion of that commit (originally to TM branch, later
merged) on the mailing list, but from
   DECL_NAME (new_decl)
     = create_tmp_var_name (IDENTIFIER_POINTER (DECL_NAME (old_decl)));
-  SET_DECL_ASSEMBLER_NAME (new_decl, NULL_TREE);
+  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
snippet elsewhere in that commit it seems create_tmp_var_name was used at
that point also to determine function names of clones, so presumably the
clean_symbol_name at that point was to ensure the symbol could be emitted
into assembly, maybe in case DECL_NAME is something like C++ operators or
whatever could have there undesirable characters.

Anyway, we don't do that for years anymore, already GCC 4.5 uses for such
purposes clone_function_name which starts of DECL_ASSEMBLER_NAME of the old
function and appends based on supportable symbol suffix separators the
separator and some suffix and/or number, so that part doesn't go through
create_tmp_var_name.

I don't see problems with having the $ and . etc. characters in the names
intended just to make dumps more readable, after all, we already are using
those in the SRA created names.  Those names shouldn't make it into the
assembly in any way, neither debug info nor assembly labels.

There is one theoretical case, where the gimplifier promotes automatic
vars into TREE_STATIC ones and therefore those can then appear in assembly,
just in case it would be on e.g. SRA created names and regimplified later.
Because no cases of promotion of DECL_NAMELESS vars to static was observed in
{x86_64,i686,powerpc64le}-linux bootstraps/regtests, the code simply uses
C.NNN names for DECL_NAMELESS vars like it does for !DECL_NAME vars.

Richi mentioned on IRC that the non-cleaned up names might make things
harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
the dumping for GIMPLE FE purposes that cleans those up (but at that point
it should also verify if some such cleaned up names don't collide with
others and somehow deal with those).

2024-08-07  Jakub Jelinek  <jakub@redhat.com>

PR c++/116219
* gimple-expr.cc (remove_suffix): Formatting fixes.
(create_tmp_var_name): Don't call clean_symbol_name.
* gimplify.cc (gimplify_init_constructor): When promoting automatic
DECL_NAMELESS vars to static, don't preserve their DECL_NAME.

OpenMP: Constructors and destructors for "declare target" static aggregates

This commit also compile-time expands (__builtin_)omp_is_initial_device for
both Fortran and C/C++ (unless, -fno-builtin-omp_is_initial_device is used).
But the main change is:

This commit adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

Before this commit, space is allocated on the target for such aggregates,
but nothing ever constructs them properly, so they end up zero-initialised.

(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

2024-08-07 Julian Brown <julian@codesourcery.com>
Tobias Burnus <tobias@baylibre.com>

gcc/ChangeLog:

* builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define
DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version.
* gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New.
(gimple_fold_builtin): Call it.
* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define.
* tree.cc (get_file_function_name): Support names for on-target
constructor/destructor functions.

gcc/cp/
* decl2.cc (tree-inline.h): Include.
(static_init_fini_fns): Bump to four entries. Update comment.
(start_objects, start_partial_init_fini_fn): Add 'omp_target'
parameter. Support "declare target" decls. Update forward declaration.
(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
the created function. Support "declare target".
(OMP_SSDF_IDENTIFIER): New macro.
(partition_vars_for_init_fini): Support partitioning "declare target"
variables also.
(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
"declare target" decls.
(c_parse_final_cleanups): Support constructors/destructors on OpenMP
offload targets.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_option_t): Add disable_omp_is_initial_device.
* lang.opt (fbuiltin-): Add.
* options.cc (gfc_handle_option): Handle
-fno-builtin-omp_is_initial_device.
* f95-lang.cc (gfc_init_builtin_functions): Handle
DEF_GOMP_BUILTIN_COMPILER.
* trans-decl.cc (gfc_get_extern_function_decl): Add code to use
DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'.

libgomp/ChangeLog:

* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New test.
* testsuite/libgomp.c-c++-common/target-is-initial-host.c: New test.
* testsuite/libgomp.c-c++-common/target-is-initial-host-2.c: New test.
* testsuite/libgomp.fortran/target-is-initial-host.f: New test.
* testsuite/libgomp.fortran/target-is-initial-host.f90: New test.
* testsuite/libgomp.fortran/target-is-initial-host-2.f90: New test.

Co-authored-by: Tobias Burnus <tobias@baylibre.com>

c++: Implement CWG2387 - Linkage of const-qualified variable template [PR109126]

The following patch attempts to implement DR2387 by making variable
templates including their specialization TREE_PUBLIC when at file
scope and they don't have static storage class.

2024-08-07 Jakub Jelinek <jakub@redhat.com>

PR c++/109126
* decl.cc (grokvardecl): Implement CWG 2387 - Linkage of
const-qualified variable template. Set TREE_PUBLIC on variable
templates with const qualified types unless static is present.

* g++.dg/DRs/dr2387.C: New test.
* g++.dg/DRs/dr2387-aux.cc: New file.

aarch64/testsuite: Add testcases for recently fixed PRs

The commit for PR 116258, added a x86_64 specific testcase,
I thought it would be a good idea to add an aarch64 testcase too.
And since it also fixed VLA vectors too so add a SVE testcase.

Pushed as obvious after a test for aarch64-linux-gnu.

PR middle-end/116258
PR middle-end/116259

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr116258.c: New test.
* gcc.target/aarch64/sve/pr116259-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libgomp.c-c++-common/target-link-2.c: Fix test on multi-device systems

libgomp/ChangeLog:

* testsuite/libgomp.c-c++-common/target-link-2.c: Reset variable
value to handle multi-device tests.

rs6000, Add new overloaded vector shift builtin int128 variants

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro. For each of the new argument types add a
testcase and update the documentation for the built-in.

gcc/ChangeLog:
* config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
define_insn iterator to VEC_IC.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
__builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
__builtin_altivec_vsrdb_v1ti): New builtin definitions.
* config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
definitions.
* doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
built-ins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
file.

tree-optimization/116258 - do not lower PAREN_EXPR of vectors

The following avoids lowering of PAREN_EXPR of vectors as unsupported
to scalars. Instead PAREN_EXPR is like a plain move or a VIEW_CONVERT.

PR tree-optimization/116258
* tree-vect-generic.cc (expand_vector_operations_1): Do not
lower PAREN_EXPR.

* gcc.target/i386/pr116258.c: New testcase.

testsuite: Fix recent regression of g++.dg/other/sse2-pr85572-1.C

My sincere apologies for not noticing that g++.dg/other/sse2-pr85572-1.C
was FAILing with my recent ashrv2di patch.  I'm not sure how that happened.
Many thanks to Andrew Pinski for alerting me, and confirming that the
changes are harmless/beneficial.  Sorry again for the inconvenience.

2024-08-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
* g++.dg/other/sse2-pr85572-1.C: Update expected output after
my recent patch for ashrv2di3.  Now with one less instruction.