git.ipfire.org Git - thirdparty/gcc.git/log

[PR117248][LRA]: Rewriting reg notes update and fix calculation of conflict hard regs of pseudo.

  LRA updates conflict hard regs of pseudo when some hard reg dies.  A
complicated PA div/mod insns reference for clobbered explicit hard regs and
hard reg as operands.  It prevents some hard reg dying although they
still conflict with pseudos living through.  Although on such insns LRA
updates wrongly reg notes (REG_DEAD, REG_UNUSED) which are used later in
rematerialization subpass.  The patch fixes the problems.

gcc/ChangeLog:

PR rtl-optimization/117248
* lra-lives.cc (start_living, start_dying): Remove.
(insn_regnos, out_insn_regnos, insn_regnos_live_after): New.
(sparseset_contains_pseudos_p): Remove.
(make_hard_regno_live, make_hard_regno_dead): Return true if
something in liveness is changed.
(mark_pseudo_live,  mark_pseudo_dead): Ditto.
(mark_regno_live, mark_regno_dead): Ditto.
(clear_sparseset_regnos, regnos_in_sparseset_p): Use set instead
of dead_set.
(process_bb_lives): Rewrite dealing with reg notes.  Update
conflict hard regs even when clobber hard reg is not marked as
dead.
(lra_create_live_ranges_1): Add initialization/finalization of
insn_regnos, out_insn_regnos, insn_regnos_live_after.

[PR tree-optimization/117895] Fix sparc libgo build failure with CRC opts enabled

So as noted in the BZ, sparc builds of the golang libraries were failing due to
the CRC code.

Ultimately this was another mode problem in the table expansion.  Essentially
when the mode of the resultant crc was different than the mode of the input
data we could create mixed mode operations which is a no-no.  Not entirely sure
how we were getting away with it before, but it was clearly wrong.

The mode of the crc will always be at least as large at the mode of the data
for the cases we support.  So the code has been adjusted to convert the data's
mode to the crc's mode and do all the ops in the crc mode.

That fixes the libgo build problem on sparc and I've verfied that there aren't
any regressions on x86_64 as well as all the embedded targets in my tester.

PR tree-optimization/117895
gcc/
* expr.cc (calculate_table_based_CRC): Drop CRC_MODE argument.
Convert DATA to CRC's mode, then do calculations in CRC's mode.
(expand_crc_table_based): Corresponding changes.
(expand_reversed_crc_table_based): Corresponding changes.

c++: use diagnostic nesting [PR116253]

This patch uses the nested diagnostics capabilities added in the earlier
patch in the C++ frontend.

With this, and enabling the non-standard text formatting via:
  -fdiagnostics-set-output=text:experimental-nesting=yes
and using:
  -std=c++20 -fconcepts-diagnostics-depth=2
then the output for the example in SG15's P3358R0 ("SARIF for Structured
Diagnostics") is:

P3358R0.C: In function ‘int main()’:
P3358R0.C:26:6: error: no matching function for call to ‘pet(lizard)’
   26 |   pet(lizard{});
      |   ~~~^~~~~~~~~~
  • note: candidate: ‘template<class auto:1>  requires  pettable<auto:1> void pet(auto:1)’
    P3358R0.C:21:6:
       21 | void pet(pettable auto t);
          |      ^~~
    • note: template argument deduction/substitution failed:
      • note: constraints not satisfied
        • P3358R0.C: In substitution of ‘template<class auto:1>  requires  pettable<auto:1> void pet(auto:1) [with auto:1 = lizard]’:
        • required from here
          P3358R0.C:26:6:
             26 |   pet(lizard{});
                |   ~~~^~~~~~~~~~
        • required for the satisfaction of ‘pettable<auto:1>’ [with auto:1 = lizard]
          P3358R0.C:19:9:
             19 | concept pettable = has_member_pet<T> or has_default_pet<T>;
                |         ^~~~~~~~
        • note: no operand of the disjunction is satisfied
          P3358R0.C:19:38:
             19 | concept pettable = has_member_pet<T> or has_default_pet<T>;
                |                    ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
          • note: the operand ‘has_member_pet<T>’ is unsatisfied because
            P3358R0.C:19:20:
               19 | concept pettable = has_member_pet<T> or has_default_pet<T>;
                  |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            • required for the satisfaction of ‘has_member_pet<T>’ [with T = lizard]
              P3358R0.C:13:9:
                 13 | concept has_member_pet = requires(T t) { t.pet(); };
                    |         ^~~~~~~~~~~~~~
            • required for the satisfaction of ‘pettable<auto:1>’ [with auto:1 = lizard]
              P3358R0.C:19:9:
                 19 | concept pettable = has_member_pet<T> or has_default_pet<T>;
                    |         ^~~~~~~~
            • in requirements with ‘T t’ [with T = lizard]
              P3358R0.C:13:26:
                 13 | concept has_member_pet = requires(T t) { t.pet(); };
                    |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
            • note: the required expression ‘t.pet()’ is invalid, because
              P3358R0.C:13:47:
                 13 | concept has_member_pet = requires(T t) { t.pet(); };
                    |                                          ~~~~~^~
              • error: ‘struct lizard’ has no member named ‘pet’
                P3358R0.C:13:44:
                   13 | concept has_member_pet = requires(T t) { t.pet(); };
                      |                                          ~~^~~
          • note: the operand ‘has_default_pet<T>’ is unsatisfied because
            P3358R0.C:19:41:
               19 | concept pettable = has_member_pet<T> or has_default_pet<T>;
                  |                    ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
            • required for the satisfaction of ‘has_default_pet<T>’ [with T = lizard]
              P3358R0.C:16:9:
                 16 | concept has_default_pet = T::is_pettable;
                    |         ^~~~~~~~~~~~~~~
            • required for the satisfaction of ‘pettable<auto:1>’ [with auto:1 = lizard]
              P3358R0.C:19:9:
                 19 | concept pettable = has_member_pet<T> or has_default_pet<T>;
                    |         ^~~~~~~~
            • error: ‘is_pettable’ is not a member of ‘lizard’
              P3358R0.C:16:30:
                 16 | concept has_default_pet = T::is_pettable;
                    |                              ^~~~~~~~~~~
  • note: candidate: ‘void pet(dog)’
    P3358R0.C:9:6:
        9 | void pet(dog);
          |      ^~~
    • note: no known conversion for argument 1 from ‘lizard’ to ‘dog’
      P3358R0.C:9:10:
          9 | void pet(dog);
            |          ^~~
  • note: candidate: ‘void pet(cat)’
    P3358R0.C:10:6:
       10 | void pet(cat);
          |      ^~~
    • note: no known conversion for argument 1 from ‘lizard’ to ‘cat’
      P3358R0.C:10:10:
         10 | void pet(cat);
            |          ^~~

showing the hierarchical structure of the messages; ideally there
would be a UI here allowing the user to expand/collapse the messages
to drill out into the detail they are interested in.

The structure is also captured in SARIF output (via the "nestingLevel"
property).

gcc/cp/ChangeLog:
PR other/116253
* call.cc (print_conversion_rejection): Remove leading space from
diagnostic messages.
(print_conversion_rejection): Likewise.
(print_arity_information): Likewise.
(print_z_candidate): Likewise.  Add auto_diagnostic_nesting_level
before calls to fn_type_unification and diagnose_constraints.
(print_z_candidates): Add auto_diagnostic_nesting_level before
looping over candidates.
(conversion_null_warnings): Remove leading space from
diagnostic messages.
(maybe_inform_about_fndecl_for_bogus_argument_init): Likewise.
* constraint.cc (tsubst_valid_expression_requirement): Add
auto_diagnostic_nesting_level when showing why the expression is
invalid.
(satisfy_disjunction): Likewise when showing operans, and again
when replaying each branch of the disjunction.
(diagnose_constraints): Likewise when replaying satisfaction.
* error.cc (cp_diagnostic_text_starter): Set prefix.
(print_instantiation_full_context): Only show the file
if we're not showing nesting or the user has opted in to
showing location information in nested diagnostics.
(class auto_context_line): New.
(print_instantiation_partial_context_line): Replace calls to
print_location and to diagnostic_show_locus with an
auto_context_line.
(print_instantiation_partial_context): Replace calls to
print_location with an auto_context_line.
(maybe_print_constexpr_context): Likewise.
(print_constrained_decl_info): Likewise.
(print_concept_check_info): Likewise.
(print_constraint_context_head): Likewise.
(print_requires_expression_info): Likewise.

gcc/testsuite/ChangeLog:
PR other/116253
* g++.dg/concepts/nested-diagnostics-1-truncated.C: New test.
* g++.dg/concepts/nested-diagnostics-1.C: New test.
* g++.dg/concepts/nested-diagnostics-2.C: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

i386: Add missing part from my previous commit.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_decompose_address):
Add missing part from my previous commit.

i386: Fix gcc.target/i386/pr101716.c (and some related cleanups)

Fix pr101716.c testcase scan-assembler failure. The combine pass will not
combine instructions that use registers in TARGET_CLASS_LIKELY_SPILLED
class, such as %eax return register in AREG class.

Change the testcase to use pseudos only and explicitly scan for
zero_extendsidi pattern name.

While looking there, also clean ix86_decompose_address a bit: eliminate
common code and use UINTVAL and HOST_WIDE_INT_UC macros in the condition
for AND wrapped address.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_decompose_address): Eliminate
common code and use use UINTVAL and HOST_WIDE_INT_UC macros
in the condition for AND wrapped address.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr101716.c (dg-options): Add -dp.
(dg-final): Scan for zero_extendsidi.
(sample1): Change the code to use pseudos only.

arm,testsuite: Add -mtune=cortex-m55 to dlstp-int8x16.c

Like dlstp-compile-asm-1.c, this test would fail if GCC is configured
with non-default options, such as -mtune=cortex-a9.

Force -mtune=cortex-m55 to avoid this unexpected issue.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-int8x16.c: Add -mtune=cortex-m55

i386: Fix unwanted fwprop to 3dNOW! insn [PR117926]

The compiler is able to forward propagate a partial vector V4SF instruction
using XMM registers to a 3dNOW! V2SF instruction using MM registers. Prevent
unwanted transformation by tagging 3dNOW! V2SF instructions using generic
RTXes with "(unspec [(const_int 0)] UNSPEC_3DNOW)" tag.

PR target/117926

gcc/ChangeLog:

* config/i386/mmx.md (UNSPEC_3DNOW): New unspec.
(mmx_addv2sf3): Tag insn with UNSPEC_3DNOW tag.
(*mmx_addv2sf3): Ditto.
(mmx_sub2vsf3): Ditto.
(mmx_subrv2sf3): Ditto.
(*mmx_subv2sf3): Ditto.
(mmx_mulv2sf3): Ditto.
(mmx_<smaxmin:code>v2sf3): Ditto.
(*mmx_<smaxmin:code>v2sf3): Ditto.
(mmx_ieee_<ieee_maxmin>v2sf3): Ditto.
(mmx_eqv2sf3): Ditto.
(*mmx_eqv2sf3): Ditto.
(mmx_gtv2sf3): Ditto.
(mmx_gev2sf3): Ditto.
(mmx_fix_truncv2sfv2si2): Ditto.
(mmx_floatv2siv2sf2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117926.c: New test.

arm: testsuite: fix some legacy C tests

These tests all lack ISO-C style function definitions. Some
deliberatly so. Rather than try to adjust the code and risk changing
the nature of the test, add -std=c17 to the test options.

gcc/testsuite/ChangeLog:

* gcc.target/arm/20031108-1.c: Add -std=c17.
* gcc.target/arm/fp16-unprototyped-1.c: Likewise.
* gcc.target/arm/fp16-unprototyped-2.c: Likewise.
* gcc.target/arm/neon-thumb2-move.c: Likewise.
* gcc.target/arm/pr67756.c: Likewise.
* gcc.target/arm/pr81863.c: Likewise.

clang-format BraceWrapping.AfterCaseLabel to true

This setting seems to better match the indentation that is used in GCC.

Adds an exra level of indentation after braces in a case statement.

Only manual testing done on the switch statements in
c-common.cc:resolve_overloaded_builtin and
alias.cc:record_component_aliases.

Ok for trunk?

contrib/ChangeLog:

* clang-format: Set BraceWrapping.AfterCaseLabel.

Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>

diagnostics: UX: add doc URLs for attributes (v2)

This is v2 of the patch; v1 was here:
  https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655541.html

Changed in v2:
* added a new TARGET_DOCUMENTATION_NAME hook for figuring out which
  documentation URL to use when there are multiple per-target docs,
  such as for __attribute__((interrupt)); implemented this for all
  targets that have target-specific attributes
* moved attribute_urlifier and its support code to a new
  gcc-attribute-urlifier.cc since it needs to use targetm for the
  above; gcc-urlifier.o is used by the driver.
* fixed extend.texi so that some attributes that failed to appear in
  attr-urls.def now do so (affected nvptx "kernel" and "shared" attrs)
* regenerated attr-urls.def for the above fix, and bringing in
  attributes added since v1 of the patch

In r14-5118-gc5db4d8ba5f3de I added a mechanism to automatically add
documentation URLs to quoted strings in diagnostics.
In r14-6920-g9e49746da303b8 I added a mechanism to generate URLs for
mentions of command-line options in quoted strings in diagnostics.

This patch does a similar thing for attributes.  It adds a new Python 3
script to scrape the generated HTML looking for documentation of
attributes, and uses this to (re)generate a new gcc/attr-urls.def file.

Running "make regenerate-attr-urls" after rebuilding the HTML docs will
regenerate gcc/attr-urls.def in the source directory.

The patch uses this to optionally add doc URLs for attributes in any
diagnostic emitted during the lifetime of a auto_urlify_attributes
instance, and adds such instances everywhere that a diagnostic refers
to a diagnostic within quotes (based on grepping the source tree
for references to attributes in strings and in code).

For example, given:

$ ./xgcc -B. -S ../../src/gcc/testsuite/gcc.dg/attr-access-2.c
../../src/gcc/testsuite/gcc.dg/attr-access-2.c:14:16: warning:
attribute ‘access(read_write, 2, 3)’ positional argument 2 conflicts
with previous designation by argument 1 [-Wattributes]

with this patch the quoted text `access(read_write, 2, 3)'
automatically gains the URL for our docs for "access":
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-access-function-attribute
in a sufficiently modern terminal.

Like r14-6920-g9e49746da303b8 this avoids the Makefile target
depending on the generated HTML, since a missing URL is a minor
problem, whereas requiring all users to build HTML docs seems more
involved.  Doing so also avoids Python 3 as a build requirement for
everyone, but instead just for developers addding attributes.
Like the options, we could add a CI test for this.

The patch gathers both general and target-specific attributes.
For example, the function attribute "interrupt" has 19 URLs within our
docs: one common, and 18 target-specific ones.
The patch adds a new target hook used when selecting the most
appropriate one.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
gcc/ChangeLog:
* Makefile.in (OBJS): Add -attribute-urlifier.o.
(ATTR_URLS_HTML_DEPS): New.
(regenerate-attr-urls): New.
(regenerate-attr-urls-unit-test): New.
* attr-urls.def: New file.
* attribs.cc: Include "gcc-urlifier.h".
(decl_attributes): Use auto_urlify_attributes.
* config/aarch64/aarch64.cc (TARGET_DOCUMENTATION_NAME): New.
* config/arc/arc.cc (TARGET_DOCUMENTATION_NAME): New.
* config/arm/arm.cc (TARGET_DOCUMENTATION_NAME): New.
* config/bfin/bfin.cc (TARGET_DOCUMENTATION_NAME): New.
* config/bpf/bpf.cc (TARGET_DOCUMENTATION_NAME): New.
* config/epiphany/epiphany.cc (TARGET_DOCUMENTATION_NAME): New.
* config/gcn/gcn.cc (TARGET_DOCUMENTATION_NAME): New.
* config/h8300/h8300.cc (TARGET_DOCUMENTATION_NAME): New.
* config/i386/i386.cc (TARGET_DOCUMENTATION_NAME): New.
* config/ia64/ia64.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m32c/m32c.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m32r/m32r.cc (TARGET_DOCUMENTATION_NAME): New.
* config/m68k/m68k.cc (TARGET_DOCUMENTATION_NAME): New.
* config/mcore/mcore.cc (TARGET_DOCUMENTATION_NAME): New.
* config/microblaze/microblaze.cc (TARGET_DOCUMENTATION_NAME):
New.
* config/mips/mips.cc (TARGET_DOCUMENTATION_NAME): New.
* config/msp430/msp430.cc (TARGET_DOCUMENTATION_NAME): New.
* config/nds32/nds32.cc (TARGET_DOCUMENTATION_NAME): New.
* config/nvptx/nvptx.cc (TARGET_DOCUMENTATION_NAME): New.
* config/riscv/riscv.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rl78/rl78.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rs6000/rs6000.cc (TARGET_DOCUMENTATION_NAME): New.
* config/rx/rx.cc (TARGET_DOCUMENTATION_NAME): New.
* config/s390/s390.cc (TARGET_DOCUMENTATION_NAME): New.
* config/sh/sh.cc (TARGET_DOCUMENTATION_NAME): New.
* config/stormy16/stormy16.cc (TARGET_DOCUMENTATION_NAME): New.
* config/v850/v850.cc (TARGET_DOCUMENTATION_NAME): New.
* config/visium/visium.cc (TARGET_DOCUMENTATION_NAME): New.

gcc/analyzer/ChangeLog:
* region-model.cc: Include "gcc-urlifier.h".
(reason_attr_access::emit): Use auto_urlify_attributes.
* sm-taint.cc: Include "gcc-urlifier.h".
(tainted_access_attrib_size::emit): Use auto_urlify_attributes.

gcc/c-family/ChangeLog:
* c-attribs.cc: Include "gcc-urlifier.h".
(positional_argument): Use auto_urlify_attributes.
* c-common.cc: Include "gcc-urlifier.h".
(parse_optimize_options): Use auto_urlify_attributes with
OPT_Wattributes.
(attribute_fallthrough_p): Use auto_urlify_attributes.
* c-warn.cc: Include "gcc-urlifier.h".
(diagnose_mismatched_attributes): Use auto_urlify_attributes.

gcc/c/ChangeLog:
* c-decl.cc: Include "gcc-urlifier.h".
(start_decl): Use auto_urlify_attributes with OPT_Wattributes.
(start_function): Likewise.
* c-parser.cc: Include "gcc-urlifier.h".
(c_parser_statement_after_labels): Use auto_urlify_attributes with
OPT_Wattributes.
* c-typeck.cc: Include "gcc-urlifier.h".
(maybe_warn_nodiscard): Use auto_urlify_attributes with
OPT_Wunused_result.

gcc/cp/ChangeLog:
* cp-gimplify.cc: Include "gcc-urlifier.h".
(process_stmt_hotness_attribute): Use auto_urlify_attributes with
OPT_Wattributes.
* cvt.cc: Include "gcc-urlifier.h".
(maybe_warn_nodiscard): Use auto_urlify_attributes with
OPT_Wunused_result.
* decl.cc: Include "gcc-urlifier.h".
(start_decl): Use auto_urlify_attributes.
(start_preparsed_function): Likewise.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_context::override_urlifier): New.
* diagnostic.h (diagnostic_context::override_urlifier): New decl.
* doc/extend.texi (Nvidia PTX Function Attributes): Update
@cindex to specify that "kernel" is a function attribute and
"shared" is a variable attribute, so that these entries are
recognized by the regex in regenerate-attr-urls.py.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_DOCUMENTATION_NAME): New.
* gcc-attribute-urlifier.cc: New file.
* gcc-urlifier.cc: Include diagnostic.h.
(gcc_urlifier::make_doc): Convert to...
(make_doc_url): ...this.
(auto_override_urlifier::auto_override_urlifier): New.
(auto_override_urlifier::~auto_override_urlifier): New.
(selftest::gcc_urlifier_cc_tests): Split out body into...
(selftest::test_gcc_urlifier): ...this.
* gcc-urlifier.h: Include "pretty-print-urlifier.h" and "label-text.h".
(make_doc_url): New decl.
(class auto_override_urlifier): New.
(class attribute_urlifier): New.
(class auto_urlify_attributes): New.
* gimple-ssa-warn-access.cc: Include "gcc-urlifier.h".
(pass_waccess::execute): Use auto_urlify_attributes.
* gimplify.cc: Include "gcc-urlifier.h".
(expand_FALLTHROUGH): Use auto_urlify_attributes.
* internal-fn.cc: Define INCLUDE_MEMORY and include
"gcc-urlifier.h.
(expand_FALLTHROUGH): Use auto_urlify_attributes.
* ipa-pure-const.cc: Include "gcc-urlifier.h.
(suggest_attribute): Use auto_urlify_attributes.
* ipa-strub.cc: Include "gcc-urlifier.h.
(can_strub_p): Use auto_urlify_attributes.
* regenerate-attr-urls.py: New file.
* selftest-run-tests.cc (selftest::run_tests): Call
gcc_attribute_urlifier_cc_tests.
* selftest.h (selftest::gcc_attribute_urlifier_cc_tests): New
decl.
* target.def (documentation_name): New DEFHOOKPOD.
* tree-cfg.cc: Include "gcc-urlifier.h.
(do_warn_unused_result): Use auto_urlify_attributes.
* tree-ssa-uninit.cc: Include "gcc-urlifier.h.
(maybe_warn_read_write_only): Use auto_urlify_attributes.
(maybe_warn_pass_by_reference): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: handle misspelled concepts and missing #include <concepts>

gcc/cp/ChangeLog:
* name-lookup.cc (suggest_alternative_in_explicit_scope):
Gracefully handle non-namespaces, such as scoped enums.
* parser.cc (cp_parser_name_lookup_error): Provide
a name_hint for the case where we're in an explicit scope.
* std-name-hint.gperf: Add <concepts>.
* std-name-hint.h: Regenerate.

gcc/testsuite/ChangeLog:
* g++.dg/concepts/missing-header.C: New test.
* g++.dg/concepts/misspelled-concept.C: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

c++: consolidate location printing in error.cc [PR116253]

Consolidate the location-printing logic in cp/error.cc, as preliminary
work towards supporting nested diagnostics (PR other/116253).

gcc/cp/ChangeLog:
PR other/116253
* error.cc (print_location): Move to earlier in the file.
(print_instantiation_partial_context_line): Replace
location-printing logic with a call to print_location.
(print_instantiation_partial_context): Likewise, splitting up
pp_verbatim calls.
(maybe_print_constexpr_context): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

avr.opt.urls: Rebuild.

gcc/
* config/avr/avr.opt.urls: Rebuild.

AVR: Disable generation of CRC lookup tables.

With -foptimize-crc, large lookup tables may be generated which
are places in .rodata (RAM). This patch disables such tables.

gcc/
* common/config/avr/avr-common.cc
(avr_option_optimization_table): Default to -fno-optimize-crc.

avoid-store-forwarding: bail when an instruction may throw [PR117816]

Avoid-store-forwarding doesn't handle the case where an instruction in
the store-load sequence contains a REG_EH_REGION note, leading to the
insertion of instructions after it, while it should be the last
instruction in the basic block. This causes an ICE when compiling
using `-O -fnon-call-exceptions -favoid-store-forwarding
-fno-forward-propagate -finstrument-functions`.

This patch rejects the transformation when there are instructions in
the sequence that may throw an exeption.

PR rtl-optimization/117816

gcc/ChangeLog:

* avoid-store-forwarding.cc (store_forwarding_analyzer::avoid_store_forwarding):
Reject the transformation when having instructions that may
throw exceptions in the sequence.

gcc/testsuite/ChangeLog:

* gcc.dg/pr117816.c: New test.

nvptx: Support '-march=sm_89'

gcc/
* config/nvptx/nvptx-sm.def: Add '89'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_89, -march-map=sm_90)
(march-map=sm_90a): Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_89'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_89.c: Adjust.
* gcc.target/nvptx/march-map=sm_90.c: Likewise.
* gcc.target/nvptx/march-map=sm_90a.c: Likewise.
* gcc.target/nvptx/march=sm_89.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm89.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.

nvptx: Support '-mptx=7.8'

gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_7_8'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_7_8): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'7.8' for 'PTX_VERSION_7_8'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=7.8'.
gcc/testsuite/
* gcc.target/nvptx/mptx=7.8.c: New.

nvptx: Support '-march=sm_52'

gcc/
* config/nvptx/nvptx-sm.def: Add '52'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_52): Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_52'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_52.c: Adjust.
* gcc.target/nvptx/march=sm_52.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm52.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.

nvptx: Support '-march=sm_37'

gcc/
* config/nvptx/nvptx-sm.def: Add '37'.
* config/nvptx/nvptx-gen.h: Regenerate.
* config/nvptx/nvptx-gen.opt: Likewise.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm): Adjust.
* config/nvptx/nvptx.opt (-march-map=sm_37, -march-map=sm_50):
Likewise.
* config.gcc: Likewise.
* doc/invoke.texi (Nvidia PTX Options): Document '-march=sm_37'.
* config/nvptx/gen-multilib-matches-tests: Extend.
gcc/testsuite/
* gcc.target/nvptx/march-map=sm_37.c: Adjust.
* gcc.target/nvptx/march-map=sm_50.c: Likewise.
* gcc.target/nvptx/march-map=sm_52.c: Likewise.
* gcc.target/nvptx/march=sm_37.c: New.
libgomp/
* testsuite/libgomp.c/declare-variant-3-sm37.c: New.
* testsuite/libgomp.c/declare-variant-3.h: Adjust.

nvptx: Support '-mptx=4.1'

gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
'PTX_VERSION_4_1'.
* config/nvptx/nvptx.cc (ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h (TARGET_PTX_4_1): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'4.1' for 'PTX_VERSION_4_1'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.1'.
gcc/testsuite/
* gcc.target/nvptx/mptx=4.1.c: New.

nvptx: Expose '-mptx=4.2'

'PTX_VERSION_4_2' was added in commit decde11183bdccc46587d6614b75f3d56a2f2e4a
"[nvptx] Choose -mptx default based on -misa" for use for '-march=sm_52'
('first_ptx_version_supporting_sm', 'PTX_ISA_SM53'), as documented by Nvidia.
However, '-mptx=4.2' wasn't exposed to the user, but there's no reason not to.

gcc/
* config/nvptx/nvptx.h (TARGET_PTX_4_2): New.
* config/nvptx/nvptx.opt (Enum(ptx_version)): Add 'EnumValue'
'4.2' for 'PTX_VERSION_4_2'.
* doc/invoke.texi (Nvidia PTX Options): Document '-mptx=4.2'.
gcc/testsuite/
* gcc.target/nvptx/mptx=4.2.c: New.

nvptx: Clarify that our baseline is PTX ISA Version 3.1

Added in commit decde11183bdccc46587d6614b75f3d56a2f2e4a
"[nvptx] Choose -mptx default based on -misa", 'PTX_VERSION_3_0' was added for
'first_ptx_version_supporting_sm' to return it for 'PTX_ISA_SM30' (as
documented by Nvidia). It's however then immediately overridden to 3.1, which
in GCC/nvptx "has been the smallest version historically", and also '-mptx=3.0'
isn't exposed to the user. As we also elsewhere (machine description etc.)
assume that our baseline is PTX ISA Version 3.1, there's no real value added in
maintaining 'PTX_VERSION_3_0' for purposes of 'first_ptx_version_supporting_sm'
only.

No change in behavior intended.

gcc/
* config/nvptx/nvptx-opts.h (enum ptx_version): Remove
'PTX_VERSION_3_0'.
* config/nvptx/nvptx.cc (first_ptx_version_supporting_sm)
(default_ptx_version_option, ptx_version_to_string)
(ptx_version_to_number): Adjust.
* config/nvptx/nvptx.h: Comment.

nvptx: Support '--with-multilib-list'

No change in behavior unless specifying it.

gcc/
* config.gcc: nvptx: Support '--with-multilib-list'.
* config/nvptx/gen-multilib-matches.sh: Adjust.
* configure.ac: Likewise.
* configure: Regenerate.
* doc/install.texi: Update.
* doc/invoke.texi: Align.
* config/nvptx/gen-multilib-matches-tests: Extend.

arm,testsuite: Add -mtune=cortex-m55 to dlstp-compile-asm-1.c test.

This test would fail if GCC is configured with non-default options,
such as -mtune=cortex-a9.

This 'unexpected' scheduling makes the DLSTP optimization generate
subs    lr, #16
bhi .L4
lctp
pop     {r4, r5, pc}
.L4:
sub     ip, ip, #16
b      <loop-begin>

instead of the expected
sub     ip, ip, #16
letp lr, <loop-begin>

Although GCC still optimizes all 144 loops, only 96 use letp, 48
others use lctp.

The patch simply forces -mtune=cortex-m55 to avoid this unexpected
issue.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/dlstp-compile-asm-1.c: Add -mtune=cortex-m55

nvptx: Enhance '-march-map=[...]' test cases

This expands upon the one test case added in
commit de0ef04419e90eacf0d1ddb265552a1b08c18d4b "[nvptx] Add march-map".

gcc/testsuite/
* gcc.target/nvptx/march-map.c: Remove; expanded into...
* gcc.target/nvptx/march-map=sm_50.c: ... this.
* gcc.target/nvptx/march-map=sm_30.c: New.
* gcc.target/nvptx/march-map=sm_32.c: Likewise.
* gcc.target/nvptx/march-map=sm_35.c: Likewise.
* gcc.target/nvptx/march-map=sm_37.c: Likewise.
* gcc.target/nvptx/march-map=sm_52.c: Likewise.
* gcc.target/nvptx/march-map=sm_53.c: Likewise.
* gcc.target/nvptx/march-map=sm_60.c: Likewise.
* gcc.target/nvptx/march-map=sm_61.c: Likewise.
* gcc.target/nvptx/march-map=sm_62.c: Likewise.
* gcc.target/nvptx/march-map=sm_70.c: Likewise.
* gcc.target/nvptx/march-map=sm_72.c: Likewise.
* gcc.target/nvptx/march-map=sm_75.c: Likewise.
* gcc.target/nvptx/march-map=sm_80.c: Likewise.
* gcc.target/nvptx/march-map=sm_86.c: Likewise.
* gcc.target/nvptx/march-map=sm_87.c: Likewise.
* gcc.target/nvptx/march-map=sm_89.c: Likewise.
* gcc.target/nvptx/march-map=sm_90.c: Likewise.
* gcc.target/nvptx/march-map=sm_90a.c: Likewise.
* gcc.target/nvptx/main.c: Remove.

nvptx: Enhance '-march=[...]' test cases

This expands upon the test cases added in
commit 4706670cd3b06bb024da0683776bf86c79d55940
"[nvptx, testsuite] Add gcc.target/nvptx/sm*.c".

gcc/testsuite/
* gcc.target/nvptx/sm30.c: Remove; expanded into...
* gcc.target/nvptx/march=sm_30.c: ... this.
* gcc.target/nvptx/sm35.c: Remove; expanded into...
* gcc.target/nvptx/march=sm_35.c: ... this.
* gcc.target/nvptx/sm53.c: Remove; expanded into...
* gcc.target/nvptx/march=sm_53.c: ... this.
* gcc.target/nvptx/sm70.c: Remove; expanded into...
* gcc.target/nvptx/march=sm_70.c: ... this.
* gcc.target/nvptx/sm75.c: Remove; expanded into...
* gcc.target/nvptx/march=sm_75.c: ... this.
* gcc.target/nvptx/sm80.c: Remove; expanded into...
* gcc.target/nvptx/march=sm_80.c: ... this.
* gcc.target/nvptx/march.c: Remove.

nvptx: Enhance '-mptx=[...]' test cases

This expands upon the test cases added in
commit a2eacdbd4c4a698b3b6f27ef5e1f8dd3d836b2e5
"[nvptx] Add __PTX_ISA_VERSION_{MAJOR,MINOR}__".

gcc/testsuite/
* gcc.target/nvptx/ptx31.c: Remove; expanded into...
* gcc.target/nvptx/mptx=3.1.c: ... this.
* gcc.target/nvptx/ptx60.c: Remove; expanded into...
* gcc.target/nvptx/mptx=6.0.c: ... this.
* gcc.target/nvptx/ptx63.c: Remove; expanded into...
* gcc.target/nvptx/mptx=6.3.c: ... this.
* gcc.target/nvptx/ptx70.c: Remove; expanded into...
* gcc.target/nvptx/mptx=7.0.c: ... this.
* gcc.target/nvptx/mptx=_.c: New.

Use new RAW_DATA_{U,S}CHAR_ELT macros in the middle-end and C FE

During the patch review of the C++ #embed optimization, Jason asked for
a macro for the common
((const unsigned char *) RAW_DATA_POINTER (value))[i]
and ditto with signed char patterns which appear in a lot of places.
In the just committed patch I've added
+#define RAW_DATA_UCHAR_ELT(NODE, I) \
+  (((const unsigned char *) RAW_DATA_POINTER (NODE))[I])
+#define RAW_DATA_SCHAR_ELT(NODE, I) \
+  (((const signed char *) RAW_DATA_POINTER (NODE))[I])
macros for that in tree.h.

The following patch is just a cleanup to use those macros where appropriate.

2024-12-06  Jakub Jelinek  <jakub@redhat.com>

gcc/
* gimplify.cc (gimplify_init_ctor_eval): Use RAW_DATA_UCHAR_ELT
macro.
* gimple-fold.cc (fold_array_ctor_reference): Likewise.
* tree-pretty-print.cc (dump_generic_node): Use RAW_DATA_UCHAR_ELT
and RAW_DATA_SCHAR_ELT macros.
* fold-const.cc (fold): Use RAW_DATA_UCHAR_ELT macro.
gcc/c/
* c-parser.cc (c_parser_get_builtin_args, c_parser_expression,
c_parser_expr_list): Use RAW_DATA_UCHAR_ELT macro.
* c-typeck.cc (digest_init): Use RAW_DATA_UCHAR_ELT and
RAW_DATA_SCHAR_ELT macros.
(add_pending_init, maybe_split_raw_data): Use RAW_DATA_UCHAR_ELT
macro.

More duplicates reported by genmatch

Here are a bit less obvious cases of duplicate, mostly of the
form (op (op:c @0 @1) (op:c @0 @1)) where it's enough to have
one :c to get all relevant cases.

* match.pd: Remove redundant :c, reported by genmatch as
duplicate patterns.

Remove some duplicates reported by genmatch

genmatch currently has a difficulty to decide whether a duplicate
structural match is really duplicate as uses of captures within
predicates or in C code can be order dependent.  For example
a reported duplicate results in

{
   tree captures[4] ATTRIBUTE_UNUSED = { _p1, _p0, _q20, _q21 }
   if (gimple_simplify_112 (res_op, seq, valueize, type, captures))
     return true;
}
{
   tree captures[4] ATTRIBUTE_UNUSED = { _p1, _p0, _q21, _q20 };
   if (gimple_simplify_112 (res_op, seq, valueize, type, captures))
     return true;
}

where the difference is only in _q20 and _q21 being swapped but
that resulting in a call to bitwise_inverted_equal_p (_p1, X)
with X once _q20 and once _q21.  That is, we treat bare
captures as equal for reporting duplicates.

Due to bitwise_inverted_equal_p there are meanwhile a _lot_ of
duplicates reported that are not actual duplicates.

The following removes some that are though, as the operands are
only passed to types_match.

* match.pd (.SAT_ADD patterns using IFN_ADD_OVERFLOW): Remove :c that
only causes duplicate patterns.

RISC-V: Add --with-cmodel configure option

Sometimes we want to use default cmodel other than medlow. Add a GCC
configure option for that.

gcc/ChangeLog:

* config.gcc (riscv*-*-*): Add support for --with-cmodel configure option.
(all_defaults): Add cmodel.
* config/riscv/riscv.h (TARGET_DEFAULT_CMODEL): Remove.
* doc/install.texi: Document --with-cmodel configure option.
* doc/invoke.texi (-mcmodel): Mention --with-cmodel configure option.

Co-authored-by: Kito Cheng <kito.cheng@sifive.com>

'gcc/config/nvptx/gen-multilib-matches.sh': Support '--selftest'

..., and invoke that before actual use.

gcc/
* config/nvptx/gen-multilib-matches.sh: Support '--selftest'.
* config/nvptx/t-nvptx (t-nvptx-gen-multilib-matches:): Invoke it.
* config/nvptx/gen-multilib-matches-tests: New.

'gcc/config/nvptx/gen-*.sh': Simplify interface

What we currently pass in as '$1' is simply 'dirname "$0"'.

gcc/
* config/nvptx/gen-h.sh: Don't pass in '$1'; compute it locally.
* config/nvptx/gen-multilib-matches.sh: Likewise.
* config/nvptx/gen-omp-device-properties.sh: Likewise.
* config/nvptx/gen-opt.sh: Likewise.
* config/nvptx/t-nvptx (s-nvptx-gen-h:, s-nvptx-gen-opt:)
(t-nvptx-gen-multilib-matches:): Adjust.
* config/nvptx/t-omp-device (omp-device-properties-nvptx):
Likewise.

'gcc/config/nvptx/gen-multilib-matches.sh': Encapsulate main logic

Refactoring for later extension. No change in behavior intended.

gcc/
* config/nvptx/gen-multilib-matches.sh: Encapsulate main logic.

'gcc/config/nvptx/t-nvptx': Don't use the 'shell' function of 'make'

The exit status of the command invoked in a 'Makefile' via '$(shell [...])'
effectively gets discarded (unless explicitly checking the GNU Make 4.2+
'.SHELLSTATUS' variable or jumping through other hoops). In order to be able
to catch errors in what the 'shell' function invokes, let's make things
explicit: similar to how 'gcc/config/avr/t-avr' is doing with 't-multilib-avr',
for example.

gcc/
* config/nvptx/t-nvptx (multilib_matches): Don't use the 'shell'
function of 'make'.
* config/nvptx/gen-multilib-matches.sh: Adjust.

nvptx: Tag '-misa=[...]', '-mptx=[...]' as 'Negative' of themselves [PR117916]

This issue is similar to what a year ago I resolved for GCN in PR112669
"GCN: wrong 'LIBRARY_PATH' in presence of several different '-march=[...]' flags".

Given the current standard nvptx configuration, we get:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mptx=6.3
    .
    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mptx=3.1
    mptx-3.1

... as expected.  The following, however, is not:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mptx=3.1 -mptx=6.3
    mptx-3.1

This should print '.'.

Or, in a '--with-arch=sm_70' configuration:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_70
    .
    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_30
    misa-sm_30

... as expected.  The following, however, are not:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_30 -misa=sm_70
    misa-sm_30
    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -misa=sm_30 -march=sm_70
    misa-sm_30
    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -march=sm_30 -march=sm_70
    misa-sm_30
    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -march=sm_30 -misa=sm_70
    misa-sm_30

These should all print '.'.

Even worse:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -print-multi-directory -mgomp -mptx=3.1 -mptx=_
    .

This should print 'mgomp'.  Otherwise, for OpenMP offloading compilation
the wrong (non-'mgomp') multilib is linked in ('.'), and linking fails
due to 'unresolved symbol __nvptx_uni'.

PR target/117916
gcc/
* config/nvptx/nvptx.opt (misa=, mptx=): Tag as 'Negative' of
themselves.

Clarify libgomp nvptx 'omp_low_lat_mem_space' documentation

PTX '%dynamic_smem_size' was "Introduced in PTX ISA version 4.1", and
"Requires 'sm_20' or higher". Given that GCC/nvptx generally supports
'sm_20', only the PTX ISA version matters here, and that's all fine if
just using GCC's defaults. Follow-up to
commit e9a19ead498fcc89186b724c6e76854f7751a89b
"openmp, nvptx: low-lat memory access traits".

libgomp/
* libgomp.texi: Clarify nvptx 'omp_low_lat_mem_space'
documentation.

Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device: Revert 'gimple_fold_builtin_acc_on_device' change

The motivation of the 'gimple_fold_builtin_acc_on_device' change in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
is unclear, and it unnecessarily diverges GCC's (default)
'--disable-offload-targets' vs. '--enable-offload-targets=[...]'
configurations.

PR testsuite/82250
gcc/
* gimple-fold.cc (gimple_fold_builtin_acc_on_device): Revert last
change.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Revert
last change.

testsuite/117714 - gcc.dg/vect/slp-reduc-4.c FAILs on 32-bit SPARC

The testcase tries to ensure we can elide all permutations when
vectorizing a MAX reduction. For SPARC the issue is that the
MAX reduction isn't supported and since we're trying to fall back
to single-lane SLP the dumps contain VEC_PERM_EXPR for the
interleaving permute lowering. Before all-SLP that wouldn't
be in the dumps when doing non-SLP, but eventually we'd fail to
vectorize so no VEC_PERM_EXPRs would be in the dumps either.

The following adds vect_no_int_min_max to the set of xfails for
this particular scan as well, like the existing check for vectorizing.

PR testsuite/117714
* gcc.dg/vect/slp-reduc-4.c: Add vect_no_int_min_max to the
XFAIL for the VEC_PERM_EXPR scan.

libcpp, c++: Optimize initializers using #embed in C++

This patch adds similar optimizations to the C++ FE as have been
implemented earlier in the C FE.
The libcpp hunk enables use of CPP_EMBED token even for C++, not just
C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
parsing (unless #embed is more than 2GB, in that case it could be
CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
bytes).
Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
in the braced initializers (and from there peels into INTEGER_CSTs unless
it is an initializer of an std::byte array or integral array with CHAR_BIT
element precision), parses CPP_EMBED in cp_parser_expression into just
the last INTEGER_CST in it because I think users don't need millions of
-Wunused-value warnings because they did useless
  int a = (
  #embed "megabyte.dat"
  );
and so most of the inner INTEGER_CSTs would be there just for the warning,
and in the rest of contexts like template argument list, function argument
list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
(I wrote a range/iterator classes to simplify that).

My dumb
cat embed-11.c
constexpr unsigned char a[] = {
  #embed "cc1plus"
};
const unsigned char *b = a;
testcase where cc1plus is 492329008 bytes long when configured
--enable-checking=yes,rtl,extra against recent binutils with .base64 gas
support results in:
time ./xg++ -B ./ -S -O2 embed-11.c

real    0m4.350s
user    0m2.427s
sys     0m0.830s
time ./xg++ -B ./ -c -O2 embed-11.c

real    0m6.932s
user    0m6.034s
sys     0m0.888s
(compared to running out of memory or very long compilation).
On a shorter inclusion,
cat embed-12.c
constexpr unsigned char a[] = {
  #embed "xg++"
};
const unsigned char *b = a;
where xg++ is 15225904 bytes long, this takes using GCC with the #embed
patchset except for this patch:
time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c

real    0m33.190s
user    0m32.327s
sys     0m0.790s
and with this patch:
time ./xg++ -B ./ -S -O2 embed-12.c

real    0m0.118s
user    0m0.090s
sys     0m0.028s

The patch doesn't change anything on what the first patch in the series
introduces even for C++, namely that #embed is expanded (actually or as if)
into a sequence of literals like
127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
and so each element has int type.
That is how I believe it is in C23, and the different versions of the
C++ P1967 paper specified there some casts, P1967R12 in particular
"Otherwise, the integral constant expression is the value of std::fgetc’s return is cast
to unsigned char."
but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast<unsigned char>(127),static_cast<unsigned char>(69),static_cast<unsigned char>(76),static_cast<unsigned char>(70),static_cast<unsigned char>(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.
Without a literal suffix for unsigned char constant literals it is horrible,
plus the incompatibility between C and C++.  Sure, we could use the magic
form more often for C++ to save the size and do the 9 or how many tokens
form only for the boundary constants and use #embed "." __gnu__::__base64__("...")
for what is in between if there are at least 2 tokens inside of it.
E.g. (unsigned char)127 vs. static_cast<unsigned char>(127) behaves
differently if there is constexpr long long p[] = { ... };
...
  #embed __FILE__
[p]

2024-12-06  Jakub Jelinek  <jakub@redhat.com>

libcpp/
* files.cc (finish_embed): Use CPP_EMBED even for C++.
gcc/
* tree.h (RAW_DATA_UCHAR_ELT, RAW_DATA_SCHAR_ELT): Define.
gcc/cp/ChangeLog:
* cp-tree.h (class raw_data_iterator): New type.
(class raw_data_range): New type.
* parser.cc (cp_parser_postfix_open_square_expression): Handle
parsing of CPP_EMBED.
(cp_parser_parenthesized_expression_list): Likewise.  Use
cp_lexer_next_token_is.
(cp_parser_expression): Handle parsing of CPP_EMBED.
(cp_parser_template_argument_list): Likewise.
(cp_parser_initializer_list): Likewise.
(cp_parser_oacc_clause_tile): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
* pt.cc (tsubst_expr): Handle RAW_DATA_CST.
* constexpr.cc (reduced_constant_expression_p): Likewise.
(raw_data_cst_elt): New function.
(find_array_ctor_elt): Handle RAW_DATA_CST.
(cxx_eval_array_reference): Likewise.
* typeck2.cc (digest_init_r): Emit -Wnarrowing and/or -Wconversion
diagnostics.
(process_init_constructor_array): Handle RAW_DATA_CST.
* decl.cc (maybe_deduce_size_from_array_init): Likewise.
(is_direct_enum_init): Fail for RAW_DATA_CST.
(cp_maybe_split_raw_data): New function.
(consume_init): New function.
(reshape_init_array_1): Add VECTOR_P argument.  Handle RAW_DATA_CST.
(reshape_init_array): Adjust reshape_init_array_1 caller.
(reshape_init_vector): Likewise.
(reshape_init_class): Handle RAW_DATA_CST.
(reshape_init_r): Likewise.
gcc/testsuite/
* c-c++-common/cpp/embed-22.c: New test.
* c-c++-common/cpp/embed-23.c: New test.
* g++.dg/cpp/embed-4.C: New test.
* g++.dg/cpp/embed-5.C: New test.
* g++.dg/cpp/embed-6.C: New test.
* g++.dg/cpp/embed-7.C: New test.
* g++.dg/cpp/embed-8.C: New test.
* g++.dg/cpp/embed-9.C: New test.
* g++.dg/cpp/embed-10.C: New test.
* g++.dg/cpp/embed-11.C: New test.
* g++.dg/cpp/embed-12.C: New test.
* g++.dg/cpp/embed-13.C: New test.
* g++.dg/cpp/embed-14.C: New test.

SVE intrinsics: Fold calls with pfalse predicate.

If an SVE intrinsic has predicate pfalse, we can fold the call to
a simplified assignment statement: For _m predication, the LHS can be assigned
the operand for inactive values and for _z, we can assign a zero vector.
For _x, the returned values can be arbitrary and as suggested by
Richard Sandiford, we fold to a zero vector.

For example,
svint32_t foo (svint32_t op1, svint32_t op2)
{
return svadd_s32_m (svpfalse_b (), op1, op2);
}
can be folded to lhs = op1, such that foo is compiled to just a RET.

For implicit predication, a case distinction is necessary:
Intrinsics that read from memory can be folded to a zero vector.
Intrinsics that write to memory or prefetch can be folded to a no-op.
Other intrinsics need case-by-case implemenation, which we added in
the corresponding svxxx_impl::fold.

We implemented this optimization during gimple folding by calling a new method
gimple_folder::fold_pfalse from gimple_folder::fold, which covers the generic
cases described above.

We tested the new behavior for each intrinsic with all supported predications
and data types and checked the produced assembly. There is a test file
for each shape subclass with scan-assembler-times tests that look for
the simplified instruction sequences, such as individual RET instructions
or zeroing moves. There is an additional directive counting the total number of
functions in the test, which must be the sum of counts of all other
directives. This is to check that all tested intrinsics were optimized.

Some few intrinsics were not covered by this patch:
- svlasta and svlastb already have an implementation to cover a pfalse
predicate. No changes were made to them.
- svld1/2/3/4 return aggregate types and were excluded from the case
that folds calls with implicit predication to lhs = {0, ...}.
- svst1/2/3/4 already have an implementation in svstx_impl that precedes
our optimization, such that it is not triggered.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/ChangeLog:

PR target/106329
* config/aarch64/aarch64-sve-builtins-base.cc
(svac_impl::fold): Add folding if pfalse predicate.
(svadda_impl::fold): Likewise.
(class svaddv_impl): Likewise.
(class svandv_impl): Likewise.
(svclast_impl::fold): Likewise.
(svcmp_impl::fold): Likewise.
(svcmp_wide_impl::fold): Likewise.
(svcmpuo_impl::fold): Likewise.
(svcntp_impl::fold): Likewise.
(class svcompact_impl): Likewise.
(class svcvtnt_impl): Likewise.
(class sveorv_impl): Likewise.
(class svminv_impl): Likewise.
(class svmaxnmv_impl): Likewise.
(class svmaxv_impl): Likewise.
(class svminnmv_impl): Likewise.
(class svorv_impl): Likewise.
(svpfirst_svpnext_impl::fold): Likewise.
(svptest_impl::fold): Likewise.
(class svsplice_impl): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svcvtxnt_impl): Likewise.
(svmatch_svnmatch_impl::fold): Likewise.
* config/aarch64/aarch64-sve-builtins.cc
(is_pfalse): Return true if tree is pfalse.
(gimple_folder::fold_pfalse): Fold calls with pfalse predicate.
(gimple_folder::fold_call_to): Fold call to lhs = t for given tree t.
(gimple_folder::fold_to_stmt_vops): Helper function that folds the
call to given stmt and adjusts virtual operands.
(gimple_folder::fold): Call fold_pfalse.
* config/aarch64/aarch64-sve-builtins.h (is_pfalse): Declare is_pfalse.

gcc/testsuite/ChangeLog:

PR target/106329
* gcc.target/aarch64/pfalse-binary_0.h: New test.
* gcc.target/aarch64/pfalse-unary_0.h: New test.
* gcc.target/aarch64/sve/pfalse-binary.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_int_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_opt_single_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_rotate.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_uint64_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binary_uint_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-binaryxn.c: New test.
* gcc.target/aarch64/sve/pfalse-clast.c: New test.
* gcc.target/aarch64/sve/pfalse-compare_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-compare_wide_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-count_pred.c: New test.
* gcc.target/aarch64/sve/pfalse-fold_left.c: New test.
* gcc.target/aarch64/sve/pfalse-load.c: New test.
* gcc.target/aarch64/sve/pfalse-load_ext.c: New test.
* gcc.target/aarch64/sve/pfalse-load_ext_gather_index.c: New test.
* gcc.target/aarch64/sve/pfalse-load_ext_gather_offset.c: New test.
* gcc.target/aarch64/sve/pfalse-load_gather_sv.c: New test.
* gcc.target/aarch64/sve/pfalse-load_gather_vs.c: New test.
* gcc.target/aarch64/sve/pfalse-load_replicate.c: New test.
* gcc.target/aarch64/sve/pfalse-prefetch.c: New test.
* gcc.target/aarch64/sve/pfalse-prefetch_gather_index.c: New test.
* gcc.target/aarch64/sve/pfalse-prefetch_gather_offset.c: New test.
* gcc.target/aarch64/sve/pfalse-ptest.c: New test.
* gcc.target/aarch64/sve/pfalse-rdffr.c: New test.
* gcc.target/aarch64/sve/pfalse-reduction.c: New test.
* gcc.target/aarch64/sve/pfalse-reduction_wide.c: New test.
* gcc.target/aarch64/sve/pfalse-shift_right_imm.c: New test.
* gcc.target/aarch64/sve/pfalse-store.c: New test.
* gcc.target/aarch64/sve/pfalse-store_scatter_index.c: New test.
* gcc.target/aarch64/sve/pfalse-store_scatter_offset.c: New test.
* gcc.target/aarch64/sve/pfalse-storexn.c: New test.
* gcc.target/aarch64/sve/pfalse-ternary_opt_n.c: New test.
* gcc.target/aarch64/sve/pfalse-ternary_rotate.c: New test.
* gcc.target/aarch64/sve/pfalse-unary.c: New test.
* gcc.target/aarch64/sve/pfalse-unary_convert_narrowt.c: New test.
* gcc.target/aarch64/sve/pfalse-unary_convertxn.c: New test.
* gcc.target/aarch64/sve/pfalse-unary_n.c: New test.
* gcc.target/aarch64/sve/pfalse-unary_pred.c: New test.
* gcc.target/aarch64/sve/pfalse-unary_to_uint.c: New test.
* gcc.target/aarch64/sve/pfalse-unaryxn.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_int_opt_n.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_int_opt_single_n.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_opt_n.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_opt_single_n.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_to_uint.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_uint_opt_n.c: New test.
* gcc.target/aarch64/sve2/pfalse-binary_wide.c: New test.
* gcc.target/aarch64/sve2/pfalse-compare.c: New test.
* gcc.target/aarch64/sve2/pfalse-load_ext_gather_index_restricted.c:
New test.
* gcc.target/aarch64/sve2/pfalse-load_ext_gather_offset_restricted.c:
New test.
* gcc.target/aarch64/sve2/pfalse-load_gather_sv_restricted.c: New test.
* gcc.target/aarch64/sve2/pfalse-load_gather_vs.c: New test.
* gcc.target/aarch64/sve2/pfalse-shift_left_imm_to_uint.c: New test.
* gcc.target/aarch64/sve2/pfalse-shift_right_imm.c: New test.
* gcc.target/aarch64/sve2/pfalse-store_scatter_index_restricted.c:
New test.
* gcc.target/aarch64/sve2/pfalse-store_scatter_offset_restricted.c:
New test.
* gcc.target/aarch64/sve2/pfalse-unary.c: New test.
* gcc.target/aarch64/sve2/pfalse-unary_convert.c: New test.
* gcc.target/aarch64/sve2/pfalse-unary_convert_narrowt.c: New test.
* gcc.target/aarch64/sve2/pfalse-unary_to_int.c: New test.

rtl-optimization/117922 - add timevar for fold-mem-offsets

The new fold-mem-offsets RTL pass takes significant amount of time
and memory. Add a timevar for it.

PR rtl-optimization/117922
* timevar.def (TV_FOLD_MEM_OFFSETS): New.
* fold-mem-offsets.cc (pass_data_fold_mem): Use TV_FOLD_MEM_OFFSETS.

c++: ICE with pack indexing empty pack [PR117898]

Here we ICE with a partially-substituted pack indexing.  The pack
expanded to an empty pack, which we can't index.  It seems reasonable
to detect this case in tsubst_pack_index, even before we substitute
the index.  Other erroneous cases can wait until pack_index_element
where we have the index.

PR c++/117898

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_index): Detect indexing an empty pack.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing2.C: Adjust.
* g++.dg/cpp26/pack-indexing12.C: New test.

RISC-V: Refactor the testcases for bswap16-0

This patch would like to refactor the testcases of bswap16-0
after sorts of optimization option passing to testcase. To
fits the big lmul like m8 for asm dump check.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: Update
the vector register RE to cover v10 - v31.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Fix incorrect optimization options passing to convert and unop

Like the strided load/store, the testcases of vector convert and unop
are designed to pick up different sorts of optimization options but
actually these option are ignored according to the Execution log of
the gcc.log.

This patch would like to make it correct almost the same as how we
fixed for strided load/store.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Fix the incorrect optimization
options passing to testcases.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

PR modula2/117904: cc1gm2 ICE when compiling a const built from VAL and SIZE

This patch fixes an ICE which occurs when a positive ZType constant
increment is used during a FOR loop.

gcc/m2/ChangeLog:

PR modula2/117904
* gm2-compiler/M2GenGCC.mod (PerformLastForIterator): Add call to
BuildConvert when increment is > 0.

gcc/testsuite/ChangeLog:

PR modula2/117904
* gm2/iso/pass/forloopbyconst.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

i386: Fix addcarry/subborrow issues [PR117860]

Fix several things to enable combine to handle addcarry/subborrow patterns:

- Fix wrong canonical form of addcarry<mode> insn and friends. For
commutative operand (PLUS RTX) binary operand (LTU) takes precedence before
unary operand (ZERO_EXTEND).

- Swap operands of GTU comparison to canonicalize addcarry/subborrow
comparison. Again, the canonical form of the compare is PLUS RTX before
ZERO_EXTEND RTX. GTU comparison is not a carry flag comparison, so we have
to swap operands in x86_canonicalize_comparison to a non-canonical form
to use LTU comparison.

- Return correct compare mode (CCCmode) for addcarry/subborrow pattern
from ix86_cc_mode, so combine is able to emit required compare mode for
combined insn.

- Add *subborrow<mode>_1 pattern having const_scalar_int_operand predicate.
Here, canonicalization of SUB (op1, const) RTX to PLUS (op1, -const) requires
negation of constant operand when ckecking operands.

With the above changes, combine is able to create *addcarry_1/*subborrow_1
pattern with immediate operand for the testcase in the PR:

SomeAddFunc:
        addq    %rcx, %rsi      # 10    [c=4 l=3]  adddi3_cc_overflow_1/0
        movq    %rdi, %rax      # 33    [c=4 l=3]  *movdi_internal/3
        adcq    $5, %rdx        # 19    [c=4 l=4]  *addcarrydi_1/0
        movq    %rsi, (%rdi)    # 23    [c=4 l=3]  *movdi_internal/5
        movq    %rdx, 8(%rdi)   # 24    [c=4 l=4]  *movdi_internal/5
        setc    %dl     # 39    [c=4 l=3]  *setcc_qi
        movzbl  %dl, %edx       # 40    [c=4 l=3]  zero_extendqidi2/0
        movq    %rdx, 16(%rdi)  # 26    [c=4 l=4]  *movdi_internal/5
        ret             # 43    [c=0 l=1]  simple_return_internal

SomeSubFunc:
        subq    %rcx, %rsi      # 10    [c=4 l=3]  *subdi_3/0
        movq    %rdi, %rax      # 42    [c=4 l=3]  *movdi_internal/3
        sbbq    $17, %rdx       # 19    [c=4 l=4]  *subborrowdi_1/0
        movq    %rsi, (%rdi)    # 33    [c=4 l=3]  *movdi_internal/5
        sbbq    %rcx, %rcx      # 29    [c=8 l=3]  *x86_movdicc_0_m1_neg
        movq    %rdx, 8(%rdi)   # 34    [c=4 l=4]  *movdi_internal/5
        movq    %rcx, 16(%rdi)  # 35    [c=4 l=4]  *movdi_internal/5
        ret             # 51    [c=0 l=1]  simple_return_internal

PR target/117860

gcc/ChangeLog:

* config/i386/i386.cc (ix86_canonicalize_comparison): Swap
operands of GTU comparison to canonicalize addcarry/subborrow
comparison.
(ix86_cc_mode): Return CCCmode for the comparison of
addcarry/subborrow pattern.
* config/i386/i386.md (addcarry<mode>): Swap operands of
PLUS RTX to make it canonical.
(*addcarry<mode>_1): Ditto.
(addcarry peephole2s): Update RTXes for addcarry<mode>_1 change.
(*add<dwi>3_doubleword_cc_overflow_1): Ditto.
(*subborrow<mode>_1): New insn pattern.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117860.c: New test.

arm: remove support for iWMMX/iWMMX2 intrinsics

The mmintrin.h header was adjusted for GCC-14 to generate a
(suppressible) warning if it was used, saying that support would be
removed in GCC-15.

Make that come true by removing the contents of this header and
emitting an error.

At this point in time I've not removed the internal support for the
intrinsics, just the wrappers that enable access to them. That can be
done at leisure from now on.

gcc/ChangeLog:

* config/arm/mmintrin.h: Raise an error if this header is used.
Remove other content.

aarch64: Mark vluti* intrinsics as QUIET

This patch fixes the vluti* definitions to say that they don't
raise FP exceptions even for floating-point modes.

gcc/
* config/aarch64/aarch64-simd-pragma-builtins.def
(ENTRY_TERNARY_VLUT8): Use FLAG_QUIET rather than FLAG_DEFAULT.
(ENTRY_TERNARY_VLUT16): Likewise.

aarch64: Reintroduce FLAG_AUTO_FP

The flag now known as FLAG_QUIET is an odd-one-out in that it
removes side-effects rather than adding them. This patch inverts
it and gives it the old name FLAG_AUTO_FP. FLAG_QUIET now means
"no flags" instead.

gcc/
* config/aarch64/aarch64-builtins.cc (FLAG_QUIET): Redefine to 0,
replacing the old flag with...
(FLAG_AUTO_FP): ...this.
(FLAG_DEFAULT): Redefine to FLAG_AUTO_FP.
(aarch64_call_properties): Update accordingly.

aarch64: Rename FLAG_NONE to FLAG_DEFAULT

This patch renames to FLAG_NONE to FLAG_DEFAULT. "NONE" suggests
that the function has no side-effects, whereas it actually means
that floating-point operations are assumed to read FPCR and to
raise FP exceptions.

gcc/
* config/aarch64/aarch64-builtins.cc (FLAG_NONE): Rename to...
(FLAG_DEFAULT): ...this and update all references.
* config/aarch64/aarch64-simd-builtins.def: Update all references
here too.
* config/aarch64/aarch64-simd-pragma-builtins.def: Likewise.

aarch64: Rename FLAG_AUTO_FP to FLAG_QUIET

I'd suggested the name "FLAG_AUTO_FP" to mean "automatically derive
FLAG_FP from the mode", i.e. automatically decide whether the function
might read the FPCR or might raise FP exceptions. However, the flag
currently suppresses that behaviour instead.

This patch renames FLAG_AUTO_FP to FLAG_QUIET. That's probably not a
great name, but it's also what the SVE code means by "quiet", and is
borrowed from "quiet NaNs".

gcc/
* config/aarch64/aarch64-builtins.cc (FLAG_AUTO_FP): Rename to...
(FLAG_QUIET): ...this and update all references.
* config/aarch64/aarch64-simd-builtins.def: Update all references
here too.

Match: Refactor the unsigned SAT_TRUNC match patterns [NFC]

This patch would like to refactor the all unsigned SAT_TRUNC patterns,
aka:
* Extract type check outside.
* Re-arrange the related match pattern forms together.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Refactor sorts of unsigned SAT_TRUNC match patterns.

Signed-off-by: Pan Li <pan2.li@intel.com>

middle-end/117801 - failed register coalescing due to GIMPLE schedule

For a TSVC testcase we see failed register coalescing due to a
different schedule of GIMPLE .FMA and stores fed by it.  This
can be mitigated by making direct internal functions participate
in TER - given we're using more and more of such functions to
expose target capabilities it seems to be a natural thing to not
exempt those.

Unfortunately the internal function expanding API doesn't match
what we usually have - passing in a target and returning an RTX
but instead the LHS of the call is expanded and written to.  This
makes the TER expansion of a call SSA def a bit unwieldly.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

The ccmp changes have likely not seen any coverage, the debug stmt
changes might not be optimal, we might end up losing on replaceable
calls.

PR middle-end/117801
* tree-outof-ssa.cc (ssa_is_replaceable_p): Make
direct internal function calls replaceable.
* expr.cc (get_def_for_expr): Handle replacements with calls.
(get_def_for_expr_class): Likewise.
(optimize_bitfield_assignment_op): Likewise.
(expand_expr_real_1): Likewise.  Properly expand direct
internal function defs.
* cfgexpand.cc (expand_call_stmt): Handle replacements with calls.
(avoid_deep_ter_for_debug): Likewise, always create a debug temp
for calls.
(expand_debug_expr): Likewise, give up for calls.
(expand_gimple_basic_block): Likewise.
* ccmp.cc (ccmp_candidate_p): Likewise.
(get_compare_parts): Likewise.

libstdc++: Use ADL swap for containers' function objects [PR117921]

The standard says that Compare, Pred and Hash objects should be swapped
as described in [swappable.requirements] which means calling swap
unqualified with std::swap visible to name lookup.

libstdc++-v3/ChangeLog:

PR libstdc++/117921
* include/bits/hashtable_policy.h (_Hash_code_base::_M_swap):
Use ADL swap for Hash members.
(_Hashtable_base::_M_swap): Use ADL swap for _Equal members.
* include/bits/stl_tree.h (_Rb_tree::swap): Use ADL swap for
_Compare members.
* testsuite/23_containers/set/modifiers/swap/adl.cc: New test.
* testsuite/23_containers/unordered_set/modifiers/swap-2.cc: New
test.

arm: Add CDE options for star-mc1 cpu

This patch adds the CDE options support for the -mcpu=star-mc1.
The star-mc1 is an Armv8-m Mainline CPU supporting CDE feature.

gcc/ChangeLog:

* config/arm/arm-cpus.in (star-mc1): Add CDE options.
* doc/invoke.texi (cdecp options): Document for star-mc1.

Signed-off-by: Qingxin Zhong <arvin.zhong@armchina.com>

doloop: Fix up doloop df use [PR116799]

The following testcases are miscompiled on s390x-linux, because the
doloop_optimize
  /* Ensure that the new sequence doesn't clobber a register that
     is live at the end of the block.  */
  {
    bitmap modified = BITMAP_ALLOC (NULL);

    for (rtx_insn *i = doloop_seq; i != NULL; i = NEXT_INSN (i))
      note_stores (i, record_reg_sets, modified);

    basic_block loop_end = desc->out_edge->src;
    bool fail = bitmap_intersect_p (df_get_live_out (loop_end), modified);
check doesn't work as intended.
The problem is that it uses df, but the df analysis was only done using
  iv_analysis_loop_init (loop);
->
  df_analyze_loop (loop);
which computes df inside on the bbs of the loop.
While loop_end bb is inside of the loop, df_get_live_out computed that
way includes registers set in the loop and used at the start of the next
iteration, but doesn't include registers set in the loop (or before the
loop) and used after the loop.

The following patch fixes that by doing whole function df_analyze first,
changes the loop iteration mode from 0 to LI_ONLY_INNERMOST (on many
targets which use can_use_doloop_if_innermost target hook a so are known
to only handle innermost loops) or LI_FROM_INNERMOST (I think only bfin
actually allows non-innermost loops) and checking not just
df_get_live_out (loop_end) (that is needed for something used by the
next iteration), but also df_get_live_in (desc->out_edge->dest),
i.e. what will be used after the loop.  df of such a bb shouldn't
be affected by the df_analyze_loop and so should be from df_analyze
of the whole function.

2024-12-05  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/113994
PR rtl-optimization/116799
* loop-doloop.cc: Include targhooks.h.
(doloop_optimize): Also punt on intersection of modified
with df_get_live_in (desc->out_edge->dest).
(doloop_optimize_loops): Call df_analyze.  Use
LI_ONLY_INNERMOST or LI_FROM_INNERMOST instead of 0 as
second loops_list argument.

* gcc.c-torture/execute/pr116799.c: New test.
* g++.dg/torture/pr113994.C: New test.

c: Diagnose unexpected va_start arguments in C23 [PR107980]

va_start macro was changed in C23 from the C17 va_start (va_list ap, parmN)
where parmN is the identifier of the last parameter into
va_start (va_list ap, ...) where arguments after ap aren't evaluated.
Late in the C23 development
"If any additional arguments expand to include unbalanced parentheses, or
a preprocessing token that does not convert to a token, the behavior is
undefined."
has been added, plus there is
"NOTE The macro allows additional arguments to be passed for va_start for
compatibility with older versions of the library only."
and
"Additional arguments beyond the first given to the va_start macro may be
expanded and used in unspecified contexts where they are unevaluated. For
example, an implementation diagnoses potentially erroneous input for an
invocation of va_start such as:"
...
va_start(vl, 1, 3.0, "12", xd); // diagnostic encouraged
...
"Simultaneously, va_start usage consistent with older revisions of this
document should not produce a diagnostic:"
...
void neigh (int last_arg, ...) {
va_list vl;
va_start(vl, last_arg); // no diagnostic

The following patch implements the recommended diagnostics.
Until now in C23 mode va_start(v, ...) was defined to
__builtin_va_start(v, 0)
and the extra arguments were silently ignored.
The following patch adds a new builtin in a form of a keyword which
parses the first argument, is silent about the __builtin_c23_va_start (ap)
form, for __builtin_c23_va_start (ap, identifier) looks the identifier up
and is silent if it is the last named parameter (except that it diagnoses
if it has register keyword), otherwise diagnoses it isn't the last one
but something else, and if there is just __builtin_c23_va_start (ap, )
or if __builtin_c23_va_start (ap, is followed by tokens other than
identifier followed by ), it skips over the tokens (with handling of
balanced ()s) until ) and diagnoses the extra tokens.
In all cases in a form of warnings.

2024-12-05 Jakub Jelinek <jakub@redhat.com>

PR c/107980
gcc/
* ginclude/stdarg.h (va_start): For C23+ change parameters from
v, ... to just ... and define to __builtin_c23_va_start(__VA_ARGS__)
rather than __builtin_va_start(v, 0).
gcc/c-family/
* c-common.h (enum rid): Add RID_C23_VA_START.
* c-common.cc (c_common_reswords): Add __builtin_c23_va_start.
gcc/c/
* c-parser.cc (c_parser_postfix_expression): Handle RID_C23_VA_START.
gcc/testsuite/
* gcc.dg/c23-stdarg-4.c: Expect extra warning.
* gcc.dg/c23-stdarg-6.c: Likewise.
* gcc.dg/c23-stdarg-7.c: Likewise.
* gcc.dg/c23-stdarg-8.c: Likewise.
* gcc.dg/c23-stdarg-10.c: New test.
* gcc.dg/c23-stdarg-11.c: New test.
* gcc.dg/torture/c23-stdarg-split-1a.c: Expect extra warning.
* gcc.dg/torture/c23-stdarg-split-1b.c: Likewise.

AVR: target/107957 - Propagate zero_reg to store sources.

When -msplit-ldst is on, it may be possible to propagate __zero_reg__
to the sources of the new stores.  For example, without this patch,

unsigned long lx;

void store_lsr17 (void)
{
   lx >>= 17;
}

compiles to:

store_lsr17:
   lds r26,lx+2           ;  movqi_insn
   lds r27,lx+3           ;  movqi_insn
   movw r24,r26           ;  *movhi
   lsr r25                ;  *lshrhi3_const
   ror r24
   ldi r26,0              ;  movqi_insn
   ldi r27,0              ;  movqi_insn
   sts lx,r24             ;  movqi_insn
   sts lx+1,r25           ;  movqi_insn
   sts lx+2,r26           ;  movqi_insn
   sts lx+3,r27           ;  movqi_insn
   ret

but with this patch it becomes:

store_lsr17:
   lds r26,lx+2           ;  movqi_insn
   lds r27,lx+3           ;  movqi_insn
   movw r24,r26           ;  *movhi
   lsr r25                ;  *lshrhi3_const
   ror r24
   sts lx,r24             ;  movqi_insn
   sts lx+1,r25           ;  movqi_insn
   sts lx+2,__zero_reg__  ;  movqi_insn
   sts lx+3,__zero_reg__  ;  movqi_insn
   ret

gcc/
PR target/107957
* config/avr/avr-passes-fuse-move.h (bbinfo_t) <try_mem0_p>:
Add static property.
* config/avr/avr-passes.cc (bbinfo_t::try_mem0_p): Define it.
(optimize_data_t::try_mem0): New method.
(bbinfo_t::optimize_one_block) [bbinfo_t::try_mem0_p]: Run try_mem0.
(bbinfo_t::optimize_one_function): Set bbinfo_t::try_mem0_p.
* config/avr/avr.md (pushhi1_insn): Also allow zero as source.
(define_split) [avropt_split_ldst]: Only run avr_split_ldst()
when avr-fuse-move has been run at least once.
* doc/invoke.texi (AVR Options) <-msplit-ldst>: Document it.

AVR: target/107957 - Split multi-byte loads and stores.

This patch splits multi-byte loads and stores into single-byte
ones provided:

-  New option -msplit-ldst is on (e.g. -O2 and higher), and
-  The memory is non-volatile, and
-  The address space is generic, and
-  The split addresses are natively supported by the hardware.

gcc/
PR target/107957
* config/avr/avr.opt (-msplit-ldst, avropt_split_ldst):
New option and associated var.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit_ldst.
* config/avr/avr-passes.cc (splittable_address_p)
(avr_byte_maybe_mem, avr_split_ldst): New functions.
* config/avr/avr-protos.h (avr_split_ldst): New proto.
* config/avr/avr.md (define_split) [avropt_split_ldst]: Run
avr_split_ldst().

AVR: target/64242 - Copy FP to a local reg in nonlocal_goto.

In nonlocal_goto sets, change hard_frame_pointer_rtx only after
emit_stack_restore() restored SP. This is needed because SP
my be stored in some frame location.

gcc/
PR target/64242
* config/avr/avr.md (nonlocal_goto): Don't restore
hard_frame_pointer_rtx directly, but copy it to local
register, and only set hard_frame_pointer_rtx from it
after emit_stack_restore().

AVR: Rework patterns that add / subtract an (inverted) MSB.

gcc/
* config/avr/avr-protos.h (avr_out_add_msb): New proto.
* config/avr/avr.cc (avr_out_add_msb): New function.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_GE0,
ADJUST_LEN_ADD_LT0]: Handle cases.
* config/avr/avr.md (adjust_len) <add_lt0, add_ge0>: New attr values.
(QISI2): New mode iterator.
(C_MSB): New mode_attr.
(*add<mode>3...msb_split, *add<mode>3.ge0, *add<mode>3.lt0)
(*sub<mode>3...msb_split, *sub<mode>3.ge0, *sub<mode>3.lt0): New
patterns replacing old ones, but with iterators and
using avr_out_add_msb() for asm out.

doc: Add store-forwarding-max-distance to invoke.texi

gcc/ChangeLog:

* doc/invoke.texi: Add store-forwarding-max-distance.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

params.opt: Fix typo

Add missing '=' after -param=cycle-accurate-model.

gcc/ChangeLog:

* params.opt: Add missing '=' after -param=cycle-accurate-model.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

Allow limited extended asm at toplevel [PR41045]

In the Cauldron IPA/LTO BoF we've discussed toplevel asms and it was
discussed it would be nice to tell the compiler something about what
the toplevel asm does.  Sure, I'm aware the kernel people said they
aren't willing to use something like that, but perhaps other projects
do.  And for kernel perhaps we should add some new option which allows
some dumb parsing of the toplevel asms and gather something from that
parsing.

The following patch is just a small step towards that, namely, allow
some subset of extended inline asm outside of functions.
The patch is unfinished, LTO streaming (out/in) of the ASM_EXPRs isn't
implemented (it emits a sorry diagnostics), nor any cgraph/varpool
changes to find out references etc.

The patch allows something like:

int a[2], b;
enum { E1, E2, E3, E4, E5 };
struct S { int a; char b; long long c; };
asm (".section blah; .quad %P0, %P1, %P2, %P3, %P4; .previous"
     : : "m" (a), "m" (b), "i" (42), "i" (E4), "i" (sizeof (struct S)));

Even for non-LTO, that could be useful e.g. for getting enumerators from
C/C++ as integers into the toplevel asm, or sizeof/offsetof etc.

The restrictions I've implemented are:
1) asm qualifiers aren't still allowed, so asm goto or asm inline can't be
   specified at toplevel, asm volatile has the volatile ignored for C++ with
   a warning and is an error in C like before
2) I see good use for mainly input operands, output maybe to make it clear
   that the inline asm may write some memory, I don't see a good use for
   clobbers, so the patch doesn't allow those (and of course labels because
   asm goto can't be specified)
3) the patch allows only constraints which don't allow registers, so
   typically "m" or "i" or other memory or immediate constraints; for
   memory, it requires that the operand is addressable and its address
   could be used in static var initializer (so that no code actually
   needs to be emitted for it), for others that they are constants usable
   in the static var initializers
4) the patch disallows + (there is no reload of the operands, so I don't
   see benefits of tying some operands together), nor % (who cares if
   something is commutative in this case), or & (again, no code is emitted
   around the asm), nor the 0-9 constraints

Right now there is no way to tell the compiler that the inline asm defines
some symbol, that is implemented in a later patch, as : constraint.

Similarly, the c modifier doesn't work in all cases and the cc modifier
is implemented separately.

2024-12-05  Jakub Jelinek  <jakub@redhat.com>

PR c/41045
gcc/
* output.h (insn_noperands): Declare.
* final.cc (insn_noperands): No longer static.
* varasm.cc (assemble_asm): Handle ASM_EXPR.
* lto-streamer-out.cc (lto_output_toplevel_asms): Add sorry_at
for non-STRING_CST toplevel asm for now.
* doc/extend.texi (Basic @code{asm}, Extended @code{asm}): Document
that extended asm is now allowed outside of functions with certain
restrictions.
gcc/c/
* c-parser.cc (c_parser_asm_string_literal): Add forward declaration.
(c_parser_asm_definition): Parse also extended asm without
clobbers/labels.
* c-typeck.cc (build_asm_expr): Allow extended asm outside of
functions and check extra restrictions.
gcc/cp/
* cp-tree.h (finish_asm_stmt): Add TOPLEV_P argument.
* parser.cc (cp_parser_asm_definition): Parse also extended asm
without clobbers/labels outside of functions.
* semantics.cc (finish_asm_stmt): Add TOPLEV_P argument, if set,
check extra restrictions for extended asm outside of functions.
* pt.cc (tsubst_stmt): Adjust finish_asm_stmt caller.
gcc/testsuite/
* c-c++-common/toplevel-asm-1.c: New test.
* c-c++-common/toplevel-asm-2.c: New test.
* c-c++-common/toplevel-asm-3.c: New test.

RISC-V: Add const to function_shape::get_name [NFC]

function_shape::get_name is the funciton for building intrinsic function name,
the result should not be changed by others once it built.

So add const to the return type to make sure no one change that by
accident.

gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-shapes.cc
(vsetvl_def::get_name): Adjust return type.
(loadstore_def::get_name): Ditto.
(indexed_loadstore_def::get_name): Ditto.
(th_loadstore_width_def::get_name): Ditto.
(th_indexed_loadstore_width_def::get_name): Ditto.
(alu_def::get_name): Ditto.
(alu_frm_def::get_name): Ditto.
(widen_alu_frm_def::get_name): Ditto.
(narrow_alu_frm_def::get_name): Ditto.
(reduc_alu_frm_def::get_name): Ditto.
(widen_alu_def::get_name): Ditto.
(no_mask_policy_def::get_name): Ditto.
(return_mask_def::get_name): Ditto.
(narrow_alu_def::get_name): Ditto.
(move_def::get_name): Ditto.
(mask_alu_def::get_name): Ditto.
(reduc_alu_def::get_name): Ditto.
(th_extract_def::get_name): Ditto.
(scalar_move_def::get_name): Ditto.
(vundefined_def::get_name): Ditto.
(misc_def::get_name): Ditto.
(vset_def::get_name): Ditto.
(vcreate_def: Ditto.::get_name): Ditto.
(read_vl_def::get_name): Ditto.
(fault_load_def::get_name): Ditto.
(vlenb_def::get_name): Ditto.
(seg_loadstore_def::get_name): Ditto.
(seg_indexed_loadstore_def::get_name): Ditto.
(seg_fault_load_def::get_name): Ditto.
(crypto_vv_def::get_name): Ditto.
(crypto_vi_def::get_name): Ditto.
(crypto_vv_no_op_type_def::get_name): Ditto.
(sf_vqmacc_def::get_name): Ditto.
(sf_vqmacc_def::get_name): Ditto.
(sf_vfnrclip_def::get_name): Ditto.
* config/riscv/riscv-vector-builtins.cc
(function_builder::add_unique_function): Adjust the type for the
function name holder.
(function_builder::add_overloaded_function): Ditto.
* config/riscv/riscv-vector-builtins.h (function_shape::get_name): Add
const to the return type.

Daily bump.

compiler: traverse method declarations

We were not consistently traversing method declarations, which appear
if there is a method without a body.  The gc compiler rejects that
case, but gofrontend currently permits it.  Maybe that should change,
but not today.

This avoids a compiler crash if there are method declarations
with types that require specific functions.  I didn't bother
with a test case because a program with method declarations is
almost certainly invalid anyhow.

Fixes PR go/117891

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/633495

c++: give suggestion on misspelled class name [PR116771]

gcc/cp/ChangeLog:
PR c++/116771
* parser.cc (cp_parser_name_lookup_error): Provide suggestions for
the case of complete failure where there is no scope.

gcc/testsuite/ChangeLog:
PR c++/116771
* g++.dg/spellcheck-pr116771.C: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libgdiagnostics: documentation tweaks

gcc/ChangeLog:
* doc/libgdiagnostics/topics/execution-paths.rst: Add '§' before
references to section of SARIF spec.
* doc/libgdiagnostics/topics/fix-it-hints.rst: Likewise.
* doc/libgdiagnostics/tutorial/01-hello-world.rst: Fix typo.
* doc/libgdiagnostics/tutorial/02-physical-locations.rst: Likewise.
* doc/libgdiagnostics/tutorial/04-notes.rst: Likewise.
* doc/libgdiagnostics/tutorial/06-fix-it-hints.rst: Add link to
diagnostic_add_fix_it_hint_replace.
* doc/libgdiagnostics/tutorial/07-execution-paths.rst: Add '§'.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

sched1: debug/model: dump predecessor list and BB num [NFC]

This is broken out of predecessor promotion patch so that debugging can
proceed during stage1 restrictions.

gcc/ChangeLog:
* haifa-sched.cc (model_choose_insn): Dump unscheduled_preds.
(model_dump_pressure_summary): Dump bb->index.
(model_start_schedule): Pass bb.
* sched-rgn.cc (debug_dependencies): Dump SD_LIST_HARD_BACK deps.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

sched1: parameterize pressure scheduling spilling aggressiveness [PR/114729]

sched1 computes ECC (Excess Change Cost) for each insn, which represents
the register pressure attributed to the insn.
Currently the pressure sensitive scheduling algorithm deliberately ignores
negative ECC values (pressure reduction), making them 0 (neutral), leading
to more spills. This happens due to the assumption that the compiler has
a reasonably accurate processor pipeline scheduling model and thus tries
to aggresively fill pipeline bubbles with spill slots.

This however might not be true, as the model might not be available for
certains uarches or even applicable especially for modern out-of-order cores.

The existing heuristic induces spill frenzy on RISC-V, noticably so on
SPEC2017 507.Cactu. If insn scheduling is disabled completely, the
total dynamic icounts for this workload are reduced in half from
~2.5 trillion insns to ~1.3 (w/ -fno-schedule-insns).

This patch adds --param=cycle-accurate-model={0,1} to gate the spill
behavior.

- The default (1) preserves existing spill behavior.

- targets/uarches sensitive to spilling can override the param to (0)
   to get the reverse effect. RISC-V backend does so too.

The actual perf numbers are very promising.

(1) On RISC-V BPI-F3 in-order CPU, -Ofast -march=rv64gcv_zba_zbb_zbs:

  Before:
  ------
  Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

      4,917,712.97 msec task-clock:u                     #    1.000 CPUs utilized
             5,314      context-switches:u               #    1.081 /sec
                 3      cpu-migrations:u                 #    0.001 /sec
           204,784      page-faults:u                    #   41.642 /sec
7,868,291,222,513      cycles:u                         #    1.600 GHz
2,615,069,866,153      instructions:u                   #    0.33  insn per cycle
    10,799,381,890      branches:u                       #    2.196 M/sec
        15,714,572      branch-misses:u                  #    0.15% of all branches

  After:
  -----
  Performance counter stats for './cactusBSSN_r_base.rivos spec_ref.par':

      4,552,979.58 msec task-clock:u                     #    0.998 CPUs utilized
           205,020      context-switches:u               #   45.030 /sec
                 2      cpu-migrations:u                 #    0.000 /sec
           204,221      page-faults:u                    #   44.854 /sec
7,285,176,204,764      cycles:u        (7.4% faster)    #    1.600 GHz
2,145,284,345,397      instructions:u (17.96% fewer)    #    0.29  insn per cycle
    10,799,382,011      branches:u                       #    2.372 M/sec
        16,235,628      branch-misses:u                  #    0.15% of all branches

(2) Wilco reported 20% perf gains on aarch64 Neoverse V2 runs.

gcc/ChangeLog:
PR target/11472
* params.opt (--param=cycle-accurate-model=): New opt.
* doc/invoke.texi (cycle-accurate-model): Document.
* haifa-sched.cc (model_excess_group_cost): Return negative
delta if param_cycle_accurate_model is 0.
(model_excess_cost): Ceil negative baseECC to 0 only if
param_cycle_accurate_model is 1.
Dump the actual ECC value.
* config/riscv/riscv.cc (riscv_option_override): Set param
to 0.

gcc/testsuite/ChangeLog:
PR target/114729
* gcc.target/riscv/riscv.exp: Enable new tests to build.
* gcc.target/riscv/sched1-spills/spill1.cpp: Add new test.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

AVR: ad target/84211 - Fix dumping INSN_UID for null insn.

gcc/
PR target/84211
* config/avr/avr-passes.cc (insninfo_t) <m_insn>: Preset to 0.
(run_find_plies) [hamm=0, dump_file]: Don't print INSN_UID
for a null m_insn.

contrib: Fix 2 bugs in check-params-in-docs.py

In my last patch for check-params-in-docs.py I accidentally
1. left one occurence of the 'help_params' variable not renamed
2. converted 'help_params' from a dict to a list

These issues cause the script to error when encountering a parameter
missing in docs. This patch should fix these issues.

contrib/ChangeLog:

* check-params-in-docs.py: 'params' -> 'help_params'. Don't
convert 'help_params' to a list.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

arm: use quotes when referring to command-line options [PR90160]

gcc/ChangeLog:
PR translation/90160
* config/arm/arm.cc (arm_option_check_internal): Use quotes in
messages that refer to command-line options. Tweak wording.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

gcc/configure: Properly remove -O flags from C[XX]FLAGS

PR bootstrap/117893
* configure.ac: Use shell loop to remove -O flags.
* configure: Regenerate.

c++: Don't reject pointer to virtual method during constant evaluation [PR117615]

We currently reject the following valid code:

=== cut here ===
struct Base {
    virtual void doit (int v) const {}
};
struct Derived : Base {
    void doit (int v) const {}
};
using fn_t = void (Base::*)(int) const;
struct Helper {
    fn_t mFn;
    constexpr Helper (auto && fn) : mFn(static_cast<fn_t>(fn)) {}
};
void foo () {
    constexpr Helper h (&Derived::doit);
}
=== cut here ===

The problem is that since r6-4014-gdcdbc004d531b4, &Derived::doit is
represented with an expression with type pointer to method and using an
INTEGER_CST (here 1), and that cxx_eval_constant_expression rejects any
such expression with a non-null INTEGER_CST.

This patch uses the same strategy as r12-4491-gf45610a45236e9 (fix for
PR c++/102786), and simply lets such expressions go through.

PR c++/117615

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Don't reject
INTEGER_CSTs with type POINTER_TYPE to METHOD_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-virtual22.C: New test.

c++: Fix up erroneous template error recovery ICE [PR117826]

The testcase in the PR (which can't be easily reduced and is
way too large and has way too many errors) results in an ICE,
because the erroneous_templates hash_map holds trees of erroneous
templates across ggc_collect and some of the templates in there
could be removed, so the later lookup can crash on comparison of
already freed and reused trees.

The following patch makes the hash_map GTY((cache)) marked.
The cp-tree.h changes before the erroneous_template declaration
are needed to make gengtype happy, it didn't like using
directive nor using a template-id as a template parameter.

It is marked cache because if a decl would be solely referenced from
the erroneous_templates hash_map, then nothing would look it up.

2024-12-04 Jakub Jelinek <jakub@redhat.com>

PR c++/117826
* cp-tree.h (struct decl_location_traits): New type.
(erroneous_templates_t): Change using into typedef.
(erroneous_templates): Add GTY((cache)).
* error.cc (cp_adjust_diagnostic_info): Use
hash_map_safe_get_or_insert<true> rather than
hash_map_safe_get_or_insert<false> for erroneous_templates.

tree-optimization/116083 - SLP discovery slowness

One large constant factor of SLP discovery is figuring the vector
type for each individual lane of each node. That should be redundant
since the structual comparison of stmts should ensure they end up
the same so the following computes them only once per node rather
than for each lane.

This cuts the compile-time of the testcase in half.

PR tree-optimization/116083
* tree-vect-slp.cc (vect_build_slp_tree_1): Compute vector
type and max_nunits only once. Remove check for matching
vector type of each lane and replace it with matching check
for LHS type.

RISC-V: Add assert for insn operand out of range access [PR117878][NFC]

According to the the initial analysis of PR117878, the ice comes from
the out-of-range operand access for recog_data.operand[]. Thus, add
one assert here to expose this explicitly.

PR target/117878

gcc/ChangeLog:

* config/riscv/riscv-v.cc (vlmax_avl_type_p): Add assert for
out of range access.
(nonvlmax_avl_type_p): Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

phiopt: Reset the number of iterations information of a loop when changing an exit from the loop [PR117243]

After r12-5300-gf98f373dd822b3, phiopt could get the following bb structure:
      |
    middle-bb -----|
      |            |
      |   |----|   |
    phi<1, 2>  |   |
    cond       |   |
      |        |   |
      |--------+---|

Which was considered 2 loops. The inner loop had esimtate of upper_bound to be 8,
due to the original `for (b = 0; b <= 7; b++)`. The outer loop was already an
infinite one.
So phiopt would come along and change the condition to be unconditionally true,
we change the inner loop to being an infinite one but don't reset the estimate
on the loop and cleanup cfg comes along and changes it into one loop but also
does not reset the estimate of the loop. Then the loop unrolling uses the old estimate
and decides to add an unreachable there.o
So the fix is when phiopt changes an exit to a loop, reset the estimates, similar to
how cleanupcfg does it when merging some basic blocks.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117243
PR tree-optimization/116749

gcc/ChangeLog:

* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Reset loop
estimates if the cond_block was an exit to a loop.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117243-1.c: New test.
* gcc.dg/torture/pr117243-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Fortran: Fix B64.0 formatted write output.

PR fortran/117820

libgfortran/ChangeLog:

* io/write.c (write_b): Add test for zero needed by write_boz.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr117820.f90: New test.

Daily bump.

Rectify some test cases.

PR testsuite/52641
PR testsuite/109123
PR testsuite/114661
PR testsuite/117828
PR testsuite/116481
PR testsuite/91069
gcc/testsuite/
* gcc.dg/Wuse-after-free-pr109123.c: Use size_t
instead of long unsigned int.
* gcc.dg/c23-tag-bitfields-1.c: Requires int32plus.
* gcc.dg/pr114661.c: Same.
* gcc.dg/pr117828.c: Same.
* gcc.dg/flex-array-counted-by-2.c: Use uintptr_t
instead of unsigned long.
* gcc.dg/pr116481.c: Same.
* gcc.dg/lto/tag-1_0.c: Use int32_t instead of int.
* gcc.dg/lto/tag-1_1.c: Use int16_t instead of short.
* gcc.dg/pr91069.c: Require double64.
* gcc.dg/type-convert-var.c: Require double64plus.

libstdc++: Fix parallel std::exclusive_scan [PR108236]

The standard says that std::exclusive_scan can be used to work in
place, i.e. where the output range is the same as the input range. This
means that the first sum cannot be written to the output until after
reading the first input value, otherwise we'll already have overwritten
the first input value.

While writing a new testcase I also realised that the serial version of
std::exclusive_scan uses copy construction for the accumulator variable,
but the standard only requires Cpp17MoveConstructible. We also require
move assignable, which is missing from the standard's requirements, but
we should at least use move construction not copy construction.

A similar problem exists for some other new C++17 numeric algos, but
I'll fix the others in a subsequent commit.

libstdc++-v3/ChangeLog:

PR libstdc++/108236
* include/pstl/glue_numeric_impl.h (exclusive_scan): Pass __init
as rvalue.
* include/pstl/numeric_impl.h (__brick_transform_scan): Do not
write through __result until after reading through __first. Move
__init into return value.
(__pattern_transform_scan): Pass __init as rvalue.
* include/std/numeric (exclusive_scan): Move construct instead
of copy constructing.
* testsuite/26_numerics/exclusive_scan/2.cc: New test.
* testsuite/26_numerics/pstl/numeric_ops/108236.cc: New test.

libstdc++: Simplify allocator propagation helpers using 'if constexpr'

Use diagnostic pragmas to allow using `if constexpr` in C++11 mode, so
that we don't need to use tag dispatching.

These helpers could be removed entirely by just using `if constexpr`
directly in the container member functions, but that's a slightly larger
change that can happen later.

It also looks like we could remove the __alloc_on_copy(const Alloc&)
overload, which is unused.

libstdc++-v3/ChangeLog:

* include/bits/alloc_traits.h (__do_alloc_on_copy): Remove.
(__do_alloc_on_move __do_alloc_on_swap): Remove.
(__alloc_on_copy, __alloc_on_move, __alloc_on_swap): Use if
constexpr.

libstdc++: Add fancy pointer support to std::forward_list [PR57272]

This takes a very similar approach to the changes for std::list.

libstdc++-v3/ChangeLog:

PR libstdc++/57272
* include/bits/forward_list.h (_GLIBCXX_USE_ALLOC_PTR_FOR_LIST):
Define.
(_Fwd_list_node_base::_M_base_ptr): New member functions.
(_Fwd_list_node::_M_node_ptr): New member function.
(_Fwd_list_iterator, _Fwd_list_const_iterator): Make internal
member functions and data member private. Declare forward_list
and _Fwd_list_base as friends.
(__fwdlist::_Node_base, __fwdlist::_Node, __fwdlist::_Iterator):
New class templates.
(__fwdlist::_Node_traits): New class template.
(_Fwd_list_base): Use _Node_traits to get types. Use _Base_ptr
instad of _Fwd_list_node_base*. Use _M_base_ptr() instead of
taking address of head node.
(forward_list): Likewise.
(_Fwd_list_base::_M_get_node): Do not define for versioned
namespace.
(_Fwd_list_base::_M_put_node): Only convert pointer if needed.
(_Fwd_list_base::_M_create_node): Use __allocate_guarded_obj.
(_Fwd_list_base::_M_destroy_node): New member function.
* include/bits/forward_list.tcc (_Fwd_list_base::_M_insert_after)
(forward_list::_M_splice_after, forward_list::insert_after): Use
const_iterator::_M_const_cast() instead of casting pointers.
(_Fwd_list_base::_M_erase_after): Use _M_destroy_node.
(forward_list::remove, forward_list::remove_if): Only do
downcasts when accessing the value.
(forward_list::sort): Likewise.
* testsuite/23_containers/forward_list/capacity/1.cc: Check
max_size for new node type.
* testsuite/23_containers/forward_list/capacity/node_sizes.cc:
New test.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr.cc:
New test.
* testsuite/23_containers/forward_list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
New test.

libstdc++: Add fancy pointer support to std::list [PR57272]

Currently std::list uses raw pointers to connect its nodes, which is
non-conforming. We should use the allocator's pointer type everywhere
that a "pointer" is needed.

Because the existing types like _List_node<T> are part of the ABI now,
we can't change them. To support nodes that are connected by fancy
pointers we need a parallel hierarchy of node types. This change
introduces new class templates parameterized on the allocator's
void_pointer type, __list::_Node_base and __list::_Node_header, and new
class templates parameterized on the allocator's pointer type,
__list::Node, __list::_Iterator. The iterator class template is used for
both iterator and const_iterator. Whether std::list<T, A> should use the
old _List_node<T> or new _list::_Node<A::pointer> type family internally
is controlled by a new __list::_Node_traits traits template.

Because std::pointer_traits and std::__to_address are not defined for
C++98, there is no way to support fancy pointers in C++98. For C++98 the
_Node_traits traits always choose the old _List_node family.

In case anybody is currently using std::list with an allocator that has
a fancy pointer, this change would be an ABI break, because their
std::list instantiations would start to (correctly) use the fancy
pointer type. If the fancy pointer just contains a single pointer and so
has the same size, layout, and object represenation as a raw pointer,
the code might still work (despite being an ODR violation). But if their
fancy pointer has a different representation, they would need to
recompile all their code using that allocator with std::list. Because
std::list will never use fancy pointers in C++98 mode, recompiling
everything to use fancy pointers isn't even possible if mixing C++98 and
C++11 code that uses std::list. To alleviate this problem, compiling
with -D_GLIBCXX_USE_ALLOC_PTR_FOR_LIST=0 will force std::list to have
the old, non-conforming behaviour and use raw pointers internally. For
testing purposes, compiling with -D_GLIBCXX_USE_ALLOC_PTR_FOR_LIST=9001
will force std::list to always use the new node types. This macro is
currently undocumented, which needs to be fixed.

The original _List_node<T> type is trivially constructible and trivially
destructible, but the new __list::_Node<Ptr> type might not be,
depending on the fancy pointer data members in _Node_base. This means
that std::list needs to explicitly construct and destroy the node
object, not just the value that it contains. This commit adds a new
__allocated_obj helper which wraps an __allocated_ptr and additionally
constructs and destroys an object in the allocated storage.

Pretty printers for std::list need to be updated to handle the new node
types. Potentially we just can't pretty print them, because we don't
know how to follow the fancy pointers to traverse the list.

libstdc++-v3/ChangeLog:

PR libstdc++/57272
PR libstdc++/110952
* include/bits/allocated_ptr.h (__allocated_ptr::get): Add
const.
(__allocated_ptr::operator bool, __allocated_ptr::release): New
member functions.
(__allocate_guarded): Add inline.
(__allocated_obj): New class template.
(__allocate_guarded_obj): New function template.
* include/bits/list.tcc (_List_base::_M_clear()): Replace uses
of raw pointers. Use _M_destroy_node.
(list::emplace, list::insert): Likewise.
(list::sort): Adjust check for 0 or 1 wsize. Use template
argument list for _Scratch_list.
* include/bits/stl_list.h (_GLIBCXX_USE_ALLOC_PTR_FOR_LIST):
Define.
(_List_node_base::_Base_ptr): New typedef.
(_List_node_base::_M_base): New member functions.
(_List_node_header::_M_base): Make public and add
using-declaration for base class overload.
(__list::_Node_traits, __list::_Node_base)
(__list::_Node_header, __list::_Node, __list::_Iterator): New
class templates.
(_Scratch_list): Turn class into class template. Use _Base_ptr
typedef instead of _List_node_base*.
(_List_node::_Node_ptr): New typedef.
(_List_node::_M_node_ptr): New member function.
(_List_base, _List_impl): Use _Node_traits to get node types.
(_List_base::_M_put_node): Convert to fancy pointer if needed.
(_List_base::_M_destroy_node): New member function.
(_List_base(_List_base&&, _Node_alloc_type&&)): Use if constexpr
to make function a no-op for fancy pointers.
(_List_base::_S_distance, _List_base::_M_distance)
(_List_base::_M_node_count): Likewise.
(list): Use _Node_traits to get iterator, node and pointer
types.
(list::_M_create_node): Use _Node_ptr typedef instead of _Node*.
Use __allocate_guarded_obj instead of _M_get_node.
(list::end, list::cend, list::empty): Use node header's
_M_base() function instead of taking its address.
(list::swap): Use _Node_traits to get node base type.
(list::_M_create_node, list::_M_insert): Use _Node_ptr instead
of _Node*.
(list::_M_erase): Likewise. Use _M_destroy_node.
(__distance): Overload for __list::_Iterator.
(_Node_base::swap, _Node_base::_M_transfer): Define non-inline
member functions of class templates.
(_Node_header::_M_reverse): Likewise.
* testsuite/23_containers/list/capacity/29134.cc: Check max_size
for allocator of new node type.
* testsuite/23_containers/list/capacity/node_sizes.cc: New test.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr.cc:
New test.
* testsuite/23_containers/list/requirements/explicit_instantiation/alloc_ptr_ignored.cc:
New test.

libstdc++: Stop using _Self typedefs in std::list iterators

We can just use the injected-class-name instead of defining a new name.
That seems simpler.

libstdc++-v3/ChangeLog:

* include/bits/stl_list.h (_List_iterator): Remove _Self typedef
and just use injected-class-name instead.
(_List_const_iterator): Likewise.

libstdc++: Refactor std::list::size() for cxx11 ABI

Remove some preprocessor conditionals by moving the _M_size member for
the cxx11 ABI into a new base class, which is empty for the gcc4-compat
ABI.

Move some unused members that are only retained for ABI compatibility to
the end of _List_base and add an explanatory comment. Stop using
list::_M_node_count and list::_D_distance and then move them to the end
of std::list with a comment too.

libstdc++-v3/ChangeLog:

* include/bits/stl_list.h (_List_size): New struct.
(_List_node_header): Replace _M_size member with _List_size base
class.
(_List_node_header(_List_node_header&&)): Replace explicit uses
of _M_size with initializing the base.
(_List_node_header::_M_init): Likewise.
(_List_base::_S_distance, _List_base::_M_distance)
(_List_base::_M_node_count): Move to end of class body and add
comment.
(list::_S_distance, list::_M_node_count): Likewise.
(list::size): Inline _M_node_count effects to here.
(list::splice(iterator, list&, iterator, iterator)): Use #if and
call std::distance instead of _S_distance.

AVR: Skip some test cases that don't work for it.

gcc/testsuite/
* gcc.c-torture/execute/ieee/cdivchkd.x: New file.
* gcc.c-torture/execute/ieee/cdivchkf.x: New file.
* gcc.dg/flex-array-counted-by.c: Require wchar.
* gcc.dg/fold-copysign-1.c [avr]: Add -mdouble=64.

AVR: Improve location of late diagnostics.

Some diagnostics are issues late, e.g. in avr_print_operand().
This patch uses the insn's location as a proxy for the operand
location. Without the patch, the location is usually input_location,
which points to the closing } of the function body.

gcc/
* config/avr/avr.cc (avr_insn_location): New variable.
(avr_final_prescan_insn): Set avr_insn_location.
(avr_asm_final_postscan_insn): Unset avr_insn_location after last insn.
(avr_print_operand): Pass avr_insn_location to warning_at.

gcc/testsuite/
* gcc.dg/Warray-bounds-33.c: Adjust for avr diagnostics.
* gcc.dg/pr56228.c: Same.
* gcc.dg/pr86124.c: Same.
* gcc.dg/pr94291.c: Same.
* gcc.dg/tree-ssa/pr82059.c: Same.

Move some CRC tests into the gcc.dg/torture directory

Jakub noted that these tests were using dg-skip-if directives that implied the
tests were expected to run under multiple optimization options, which means
they probably should be in gcc.dg/torture rather than in the gcc.dg directory.

This moves the relevant tests from gcc.dg to gcc.dg/torture.

gcc/testsuite
* gcc.dg/crc-linux-1.c: Moved to from gcc.dg/torture.
* gcc.dg/crc-linux-2.c: Likewise.
* gcc.dg/crc-linux-4.c: Likewise.
* gcc.dg/crc-linux-5.c: Likewise.
* gcc.dg/crc-not-crc-15.c: Likewise.
* gcc.dg/crc-side-instr-1.c: Likewise.
* gcc.dg/crc-side-instr-2.c: Likewise.
* gcc.dg/crc-side-instr-3.c: Likewise.
* gcc.dg/crc-side-instr-4.c: Likewise.
* gcc.dg/crc-side-instr-5.c: Likewise.
* gcc.dg/crc-side-instr-6.c: Likewise.
* gcc.dg/crc-side-instr-7.c: Likewise.
* gcc.dg/crc-side-instr-8.c: Likewise.
* gcc.dg/crc-side-instr-9.c: Likewise.
* gcc.dg/crc-side-instr-10.c: Likewise.
* gcc.dg/crc-side-instr-11.c: Likewise.
* gcc.dg/crc-side-instr-12.c: Likewise.
* gcc.dg/crc-side-instr-13.c: Likewise.
* gcc.dg/crc-side-instr-14.c: Likewise.
* gcc.dg/crc-side-instr-15.c: Likewise.
* gcc.dg/crc-side-instr-16.c: Likewise.
* gcc.dg/crc-side-instr-17.c: Likewise.

c++/contracts: ICE with contract assert on non-empty statement [PR 117579]

Contract assert is an attribute on an empty statement. Currently we assert
that the statement is empty before emitting the assertion. This has been
changed to a conditional check that the statement is empty before the
assertion is emitted.

PR c++/117579

gcc/cp/ChangeLog:

* parser.cc (cp_parser_statement): Replace assertion with a
conditional check that the statement containing a contract assert
is empty.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr117579.C: New test.

Signed-off-by: Nina Ranns <dinka.ranns@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

maintainer-scripts: build the libgdiagnostics docs for the website [PR117883]

maintainer-scripts/ChangeLog:
PR web/117883
* update_web_docs_git: Introduce SPHINX_VENV to make
it easier to test the script.  Add the libgdiagnostics docs
and testsuite to the files to be preserved.  Use sphinx to build
the libgdiagnostics docs as HTML.  Copy them into $DOCSDIR.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

maintainer-scripts: fix jit docs on website

I noticed whilst working on the libgdiagnostics docs
that some errors like this were occurring in the jit docs:

/tmp/gcc-doc-update.3782849/gcc/gcc/jit/docs/cp/topics/asm.rst:63: WARNING: Include file '/tmp/gcc-doc-update.3782849/gcc/gcc/testsuite/jit.dg/test-asm.cc' not found or reading it failed

which was occurring for:
* test-asm.c and .cc
* test-switch.c
* test-accessing-union.c

and indeed https://gcc.gnu.org/onlinedocs/jit/topics/asm.html is
currently missing various code examples.

Fixed thusly; tested locally.

maintainer-scripts/ChangeLog:
* update_web_docs_git: Add the jit testsuite to the files to
be preserved, since this is used by the jit docs.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Update gcc zh_CN.po

* zh_CN.po: Update.

RISC-V: Fix test target selector

The previous target selector was not properly gating the tests to rv32
and rv64 targets. This was triggering an excess failure on rv32 targets
where it would try to run the zbc64 tests. Fix selector

gcc/testsuite/ChangeLog:

* gcc.target/riscv/crc-builtin-zbc32.c: Fix selector.
* gcc.target/riscv/crc-builtin-zbc64.c: Ditto.

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

libgdiagnostics: fix docs metadata

gcc/ChangeLog:
* doc/libgdiagnostics/conf.py: Remove "author". Change
"copyright" field to the FSF.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>