git.ipfire.org Git - thirdparty/gcc.git/log

PR modula2/118453: Subranges types do not use virtual tokens during construction

P2SymBuild.mod.BuildSubrange does not use a virtual token and therefore
any error message containing a subrange type produces poor location carots.
This patch rewrites BuildSubrange and the buildError4 procedure in
M2Check.mod (which is only called when there is a formal/actual parameter
mismatch). buildError4 now issues a sub error for the formal and actual
type declaration highlighing the type mismatch.

gcc/m2/ChangeLog:

PR modula2/118453
* gm2-compiler/M2Check.mod (buildError4): Call MetaError1
for the actual and formal parameter type.
* gm2-compiler/P2Build.bnf (SubrangeType): Construct a virtual
token containing the subrange type declaration.
(PrefixedSubrangeType): Ditto.
* gm2-compiler/P2SymBuild.def (BuildSubrange): Add tok parameter.
* gm2-compiler/P2SymBuild.mod (BuildSubrange): Use tok parameter,
rather than the token at the start of the subrange.

gcc/testsuite/ChangeLog:

PR modula2/118453
* gm2/pim/fail/badbecomes2.mod: New test.
* gm2/pim/fail/badparamset1.mod: New test.
* gm2/pim/fail/badparamset2.mod: New test.
* gm2/pim/fail/badsyntaxset1.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[PR rtl-optimization/107455] Eliminate unnecessary constant load

This resurrects a patch from a bit over 2 years ago that I never wrapped up.
IIRC, I ended up up catching covid, then in the hospital for an unrelated issue
and it just got dropped on the floor in the insanity.

The basic idea here is to help postreload-cse eliminate more const/copies by
recording a small set of conditional equivalences (as Richi said in 2022,
"Ick").

It was originally to help eliminate an unnecessary constant load I saw in
coremark, but as seen in BZ107455 the same issues show up in real code as well.

Bootstrapped and regression tested on x86-64, also been through multiple spins
in my tester.

Changes since v2:

  - Simplified logic for blocks to examine
  - Remove redundant tests when filtering blocks to examine
  - Remove bogus check which only allowed reg->reg copies

Changes since v1:

Richard B and Richard S both had good comments last time around and their
requests are reflected in this update:

  - Use rtx_equal_p rather than pointer equality
  - Restrict to register "destinations"
  - Restrict to integer modes
  - Adjust entry block handling

My own wider scale testing resulted in a few more changes.

  - Robustify extracting the (set (pc) ... ), which then required ...
  - Handle if src/dst are clobbered by the conditional branch
  - Fix logic error causing too many equivalences to be recorded

PR rtl-optimization/107455
gcc/
* postreload.cc (reload_cse_regs_1): Take advantage of conditional
equivalences.

gcc/testsuite
* gcc.target/riscv/pr107455-1.c: New test.
* gcc.target/riscv/pr107455-2.c: New test.

[ifcombine] propagate signbit mask to XOR right-hand operand

If a single-bit bitfield takes up the sign bit of a storage unit,
comparing the corresponding bitfield between two objects loads the
storage units, XORs them, converts the result to signed char, and
compares it with zero: ((signed char)(a.<byte> ^ c.<byte>) >= 0).

fold_truth_andor_for_ifcombine recognizes the compare with zero as a
sign bit test, then it decomposes the XOR into an equality test.

The problem is that, after this decomposition, that figures out the
width of the accessed fields, we apply the sign bit mask to the
left-hand operand of the compare, but we failed to also apply it to
the right-hand operand when both were taken from the same XOR.

This patch fixes that.

for gcc/ChangeLog

PR tree-optimization/118409
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Apply the
signbit mask to the right-hand XOR operand too.

for gcc/testsuite/ChangeLog

PR tree-optimization/118409
* gcc.dg/field-merge-20.c: New.

expr: Fix up the divmod cost debugging note [PR115910]

Something I've noticed during working on the crc wrong-code fix.
My first version of the patch failed because of no longer matching some
expected strings in the assembly, so I had to add TDF_DETAILS debugging
into the -fdump-rtl-expand-details dump which the crc tests can use.

For PR115910 Andrew has added similar note for the division/modulo case
if it is positive and we can choose either unsigned or signed
division.  The problem is that unlike most other TDF_DETAILS diagnostics,
this is not done before emitting the IL for the function, but during it.

Other messages there are prefixed with ;;, both details on what it is doing
and the GIMPLE IL for which it expands RTL, so the
;; Generating RTL for gimple basic block 4

;;

(code_label 13 12 14 2 (nil) [0 uses])

(note 14 13 0 NOTE_INSN_BASIC_BLOCK)
positive division: unsigned cost: 30; signed cost: 28

;; return _4;

message in between just looks weird and IMHO should be ;; prefixed.

2025-01-13  Jakub Jelinek  <jakub@redhat.com>

PR target/115910
* expr.cc (expand_expr_divmod): Prefix the TDF_DETAILS note with
";; " and add a space before (needed tie breaker).  Formatting fixes.

MAINTAINERS: Make contrib/check-MAINTAINERS.py happy

This commit makes the contrib/check-MAINTAINERS.py script happy about
our MAINTAINERS file. I hope that it knows best how things ought to
be and so am committing this as obvious.

ChangeLog:

2025-01-13 Martin Jambor <mjambor@suse.cz>

* MAINTAINERS: Fix the name order of the Write After Approval section.

ada: Update gnatdll documentation (-b option removed)

gcc/ada/ChangeLog:
* doc/gnat_ugn/platform_specific_information.rst: Update.
* gnat_ugn.texi: Regenerate.

ada: Cleanup preanalysis of static expressions (part 5)

Partially revert the fix for sem_ch13.adb as it does not comply
with RM 13.14(7.2/5).

gcc/ada/ChangeLog:

* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Restore calls
to Preanalyze_Spec_Expression that were replaced by calls to
Preanalyze_And_Resolve. Add documentation.
(Check_Aspect_At_Freeze_Point): Ditto.

ada: Fix relocatable DLL creation with gnatdll

gcc/ada/ChangeLog:

* mdll.adb: For the created DLL to be relocatable we do not want to use
the base file name when calling gnatdll.
* gnatdll.adb: Removes option -d which is not working anymore. And
when using a truly relocatable DLL the base-address has no real
meaning. Also reword the usage string for -d as we do not want to
specify relocatable as gnatdll can be used to create both
relocatable and non relocatable DLL.

ada: Remove redundant parentheses inside unary operators (cont.)

GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
build errors.

gcc/ada/ChangeLog:

* libgnat/a-strunb.ads: Remove redundant parentheses inside NOT
operators.

ada: Cleanup preanalysis of static expressions (part 4)

Fix regression in the SPARK 2014 testsuite.

gcc/ada/ChangeLog:

* sem_util.adb (Build_Actual_Subtype_Of_Component): No action
under preanalysis.
* sem_ch5.adb (Set_Assignment_Type): If the right-hand side contains
target names, expansion has been disabled to prevent expansion that
might move target names out of the context of the assignment statement.
Restore temporarily the current compilation mode so that the actual
subtype can be built.

ada: Warn about redundant parentheses inside unary operators

GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning is now emitted for
unary operators as well.

gcc/ada/ChangeLog:

* par-ch4.adb (P_Factor): Warn when the operand of a unary operator
doesn't require parentheses.

ada: Remove redundant parentheses inside unary operators in comments

GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
future build errors.

gcc/ada/ChangeLog:

* libgnat/s-genbig.adb: Remove redundant parentheses in comments.

ada: Remove redundant parentheses inside unary operators

GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
future build errors.

gcc/ada/ChangeLog:

* checks.adb, exp_dist.adb, exp_imgv.adb, exp_util.adb,
libgnarl/a-reatim.adb, libgnat/a-coinve.adb, libgnat/a-nbnbre.adb,
libgnat/a-ngcoty.adb, libgnat/a-ngelfu.adb, libgnat/a-ngrear.adb,
libgnat/a-strbou.ads, libgnat/a-strfix.ads, libgnat/a-strsea.adb,
libgnat/a-strsea.ads, libgnat/a-strsup.ads,
libgnat/a-strunb__shared.ads, libgnat/g-alleve.adb,
libgnat/g-spitbo.adb, libgnat/s-aridou.adb, libgnat/s-arit32.adb,
libgnat/s-dourea.ads, libgnat/s-genbig.adb, libgnat/s-imager.adb,
libgnat/s-statxd.adb, libgnat/s-widthi.adb, sem_attr.adb, sem_ch10.adb,
sem_ch3.adb, sem_ch6.adb, sem_ch7.adb, sem_dim.adb, sem_prag.adb,
sem_res.adb, uintp.adb: Remove redundant parentheses inside NOT and ABS
operators.

ada: Fix spurious warning about redundant parentheses in range bound

Use the same logic for warning about redundant parentheses in lower and upper
bounds of a discrete range. This fixes a spurious warning that, if followed,
would render the code illegal.

gcc/ada/ChangeLog:

* par-ch3.adb (P_Discrete_Range): Detect redundant parentheses in the
lower bound like in the upper bound.

ada: Unbounded recursion on character aggregates with predicated component subtype

The compiler was recursing endlessly when analyzing an aggregate of
an array type whose component subtype has a static predicate and the
component expressions are static, repeatedly transforming the aggregate
first into a string literal and then back into an aggregate. This is fixed
by suppressing the transformation to a string literal in the case where
the component subtype has predicates.

gcc/ada/ChangeLog:

* sem_aggr.adb (Resolve_Aggregate): Add another condition to prevent rewriting
an aggregate whose type is an array of characters, testing for the presence of
predicates on the component type.

ada: Simplify expansion of negative membership operator

Code cleanup; semantics is unaffected.

gcc/ada/ChangeLog:

* exp_ch4.adb: (Expand_N_Not_In): Preserve Alternatives in expanded
membership operator just like preserving Right_Opnd (though only
one of these fields is present at a time).
* par-ch4.adb (P_Membership_Test): Remove redundant setting of fields
to their default values.

ada: Warn about redundant parentheses in upper range bounds

Fix a glitch in condition that effectively caused detection of redundant
parentheses in upper range bounds to be dead code.

gcc/ada/ChangeLog:

* par-ch3.adb (P_Discrete_Range): Replace N_Subexpr, which was catching
all subexpressions, with kinds that catch nodes that require
parentheses to become "simple expressions".

ada: Add more commentary to System.Val_Real.Large_Powfive

gcc/ada/ChangeLog:

* libgnat/s-valrea.adb (Large_Powfive) [2 parameters]: Add a couple
of additional comments.

ada: Fix parsing of raise expressions with no parens

According to Ada grammar, raise expression is an expression, but requires
parens to be a simple_expression. We wrongly classified raise expressions
as expressions, because we mishandled a global state variable in the parser.

This patch causes some illegal code to be rejected.

gcc/ada/ChangeLog:

* par-ch4.adb (P_Relation): Prevent Expr_Form to be overwritten when
parsing the raise expression itself.
(P_Simple_Expression): Fix manipulation of Expr_Form.

tree-optimization/117119 - ICE with int128 IV in dataref analysis

Here's another fix for a missing check that an IV value fits in a
HIW. It's originally from Stefan.

PR tree-optimization/117119
* tree-data-ref.cc (initialize_matrix_A): Check whether
an INTEGER_CST fits in HWI, otherwise return chrec_dont_know.

* gcc.dg/torture/pr117119.c: New testcase.

Co-Authored-By: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>

Un-XFAIL 'dg-note's in 'gfortran.dg/goacc/routine-external-level-of-parallelism-2.f'

As of the recent commit 65286465b94cba6ee3d59edbc771bef0088ac46e
"Fortran: Fix location_t in gfc_get_extern_function_decl; [...]" change:

    The declaration created by gfc_get_extern_function_decl used input_location
    as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
    diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
    which this commit now does.

..., we're no longer using the 'dg-bogus' location informations, as pointed out
for one class of additional notes of
'gfortran.dg/goacc/routine-external-level-of-parallelism-2.f', once added in
commit 03eb779141a29f96600cd46904b88a33c4b49a66 "Add 'dg-note', 'dg-lto-note'".
Therefore, un-XFAILed 'dg-note's rather than XFAILed 'dg-bogus'es.

gcc/testsuite/
* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f:
Un-XFAIL 'dg-note's.

Bump BASE-VER to 14.0.1 now that we are in stage4.

* BASE-VER: Bump to 14.0.1.

lto: Pass cache checksum by reference [PR118181]

Bootstrapped/regtested on x86_64-linux. Committed as obvious.

PR lto/118181

gcc/ChangeLog:

* lto-ltrans-cache.cc (ltrans_file_cache::create_item):
Pass checksum by reference.
* lto-ltrans-cache.h: Likewise.

lto: Fix empty fnctl.h build error with MinGW.

MSYS2+MinGW contains headers without defining expected contents.
This fix checks that the fcntl function is actually defined.

Bootstrapped/regtested on x86_64-linux. Committed as obvious.

gcc/ChangeLog:

* lockfile.cc (LOCKFILE_USE_FCNTL): New.
(lockfile::lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::try_lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::lock_read): Use LOCKFILE_USE_FCNTL.
(lockfile::unlock): Use LOCKFILE_USE_FCNTL.
(lockfile::lockfile_supported): Use LOCKFILE_USE_FCNTL.

Refactor ix86_expand_vecop_qihi2.

Since there's regression to use vpermq, and it's manually disabled by
!TARGET_AVX512BW. I remove the codes related to vpermq and make
ix86_expand_vecop_qihi2 only handle vpmovbw + op + vpmovwb case.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Refactor to avoid redundant TARGET_AVX512BW in many places.

[PATCH] crc: Fix up some crc related wrong code issues [PR117997, PR118415]

Hi!

As mentioned in the second PR, using table names like
crc_table_for_crc_8_polynomial_0x12
in the user namespace is wrong, user could have defined such variables
in their code and as can be seen on the last testcase, then it just
misbehaves.
At minimum such names should start with 2 underscores, moving it into
implementation namespace, and if possible have some dot or dollar in the
name if target supports it.
I think assemble_crc_table right now always emits tables a local variables,
I really don't see what would be setting TREE_PUBLIC flag on
IDENTIFIER_NODEs.
It might be nice to share the tables between TUs in the same binary or
shared library, but it in that case should have hidden visibility if
possible, so that it isn't exported from the libraries or binaries, we don't
want the optimization to affect set of exported symbols from libraries.
And, as can be seen in the first PR, building gen_rtx_SYMBOL_REF by hand
is certainly unexpected on some targets, e.g. those which use
-fsection-anchors, so we should instead use DECL_RTL of the VAR_DECL.
For that we'd need to look it up if we haven't emitted it already, while
IDENTIFIER_NODEs can be looked up easily, I guess for the VAR_DECLs we'd
need custom hash table.

Now, all of the above (except sharing between multiple TUs) is already
implemented in output_constant_def, so I think it is much better to just
use that function.

And, if we want to share it between multiple TUs, we could extend the
SHF_MERGE usage in gcc, currently we only use it for constant pool
entries with same size as alignment, from 1 to 32 bytes, using .rodata.cstN
sections.  We could just use say .rodata.cstM.N sections where M would be
alignment and N would be the entity size.  We could use that for all
constant pool entries say up to 2048 bytes.
Though, as the current code doesn't share between multiple TUs, I think it
can be done incrementally (either still for GCC 15, or GCC 16+).

Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, on
aarch64 it also fixes
-FAIL: crypto/rsa
-FAIL: hash
ok for trunk?

gcc/
PR tree-optimization/117997
PR middle-end/118415
* expr.cc (assemble_crc_table): Make static, remove id argument,
use output_constant_def.  Emit note if -fdump-rtl-expand-details
about which table has been emitted.
(generate_crc_table): Make static, adjust assemble_crc_table
caller, call it always.
(calculate_table_based_CRC): Make static.
* internal-fn.cc (expand_crc_optab_fn): Emit note if
-fdump-rtl-expand-details about using optab for crc.  Formatting fix.

gcc/testsuite/
* gcc.dg/crc-builtin-target32.c: Add -fdump-rtl-expand-details
as dg-additional-options.  Scan expand dump rather than assembly,
adjust the regexps.
* gcc.dg/crc-builtin-target64.c: Likewise.
* gcc.dg/crc-builtin-rev-target32.c: Likewise.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
* gcc.dg/pr117997.c: New test.
* gcc.dg/pr118415.c: New test.

Daily bump.

d: Merge dmd, druntime c7902293d7, phobos 03aeafd20

D front-end changes:

- Import dmd v2.110.0-rc.1.
- An error is now given for subtracting pointers of different
types.

D runtime changes:

- Import druntime v2.110.0-rc.1.

Phobos changes:

- Import phobos v2.110.0-rc.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd c7902293d7.
* dmd/VERSION: Bump version to v2.110.0-rc.1.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime c7902293d7.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Rename
core/thread/fiber.d to core/thread/fiber/package.d. Add
core/thread/fiber/base.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos 63fdb282f.

gcc/testsuite/ChangeLog:

* gdc.dg/asm3.d: Adjust test.
* gdc.dg/torture/pr96435.d: Adjust test.

Dump all symbol attributes in show_attr.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_attr): Dump all symbol attributes.

d: Merge upstream dmd, druntime c57da0cf59, phobos ad8ee5587

D front-end changes:

- Import latest fixes from dmd v2.110.0-beta.1.
- The `align' attribute now allows to specify `default'
  explicitly.
- Add primary expression of the form `__rvalue(expression)'
  which causes `expression' to be treated as an rvalue, even if
  it is an lvalue.
- Shortened method syntax can now be used in constructors.

D runtime changes:

- Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

- Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd c57da0cf59.
* d-codegen.cc (can_elide_copy_p): New.
(d_build_call): Use it.
* d-lang.cc (d_post_options): Update for new front-end interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime c57da0cf59.
* src/MERGE: Merge upstream phobos ad8ee5587.
* testsuite/libphobos.init_fini/custom_gc.d: Adjust test.

gcc/testsuite/ChangeLog:

* gdc.dg/copy1.d: New test.

c: UX improvements to 'too {few,many} arguments' errors (v5) [PR118112]

Consider this case of a bad call to a callback function (perhaps
due to C23 changing the meaning of () in function decls):

struct p {
        int (*bar)();
};

void baz() {
    struct p q;
    q.bar(1);
}

Before this patch the C frontend emits:

t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'
    7 |     q.bar(1);
      |     ^

which doesn't give the user much help in terms of knowing what
was expected, and where the relevant declaration is.

With this patch the C frontend emits:

t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'; expected 0, have 1
    7 |     q.bar(1);
      |     ^     ~
t.c:2:15: note: declared here
    2 |         int (*bar)();
      |               ^~~

(showing the expected vs actual counts, the pertinent field decl, and
underlining the first extraneous argument at the callsite)

Similarly, the patch also updates the "too few arguments" case to also
show expected vs actual counts.  Doing so requires a tweak to the
wording to say "at least" for the case of variadic fns where
previously the C FE emitted e.g.:

s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'
    5 |   callee ();
      |   ^~~~~~
s.c:1:6: note: declared here
    1 | void callee (const char *, ...);
      |      ^~~~~~

with this patch it emits:

s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'; expected at least 1, have 0
    5 |   callee ();
      |   ^~~~~~
s.c:1:6: note: declared here
    1 | void callee (const char *, ...);
      |      ^~~~~~

gcc/c/ChangeLog:
PR c/118112
* c-typeck.cc (inform_declaration): Add "function_expr" param and
use it for cases where we couldn't show the function decl to show
field decls for callbacks.
(build_function_call_vec): Add missing auto_diagnostic_group.
Update for new param of inform_declaration.
(convert_arguments): Likewise.  For the "too many arguments" case
add the expected vs actual counts to the message, and if we have
it, add the location_t of the first surplus param as a secondary
location within the diagnostic.  For the "too few arguments" case,
determine the minimum number of arguments required and add the
expected vs actual counts to the message, tweaking it to "at least"
for variadic functions.

gcc/testsuite/ChangeLog:
PR c/118112
* gcc.dg/too-few-arguments.c: New test.
* gcc.dg/too-many-arguments.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Fortran: implement F2018 intrinsic OUT_OF_RANGE [PR115788]

Implementation of the Fortran 2018 standard intrinsic OUT_OF_RANGE, with
the GNU Fortran extension to unsigned integers.

Runtime code is fully inline expanded.

PR fortran/115788

gcc/fortran/ChangeLog:

* check.cc (gfc_check_out_of_range): Check arguments to intrinsic.
* expr.cc (free_expr0): Fix a memleak with unsigned literals.
* gfortran.h (enum gfc_isym_id): Define GFC_ISYM_OUT_OF_RANGE.
* gfortran.texi: Add OUT_OF_RANGE to list of intrinsics supporting
UNSIGNED.
* intrinsic.cc (add_functions): Add Fortran prototype. Break some
nearby lines with excessive length.
* intrinsic.h (gfc_check_out_of_range): Add prototypes.
* intrinsic.texi: Fortran documentation of OUT_OF_RANGE.
* simplify.cc (gfc_simplify_out_of_range): Compile-time simplification
of OUT_OF_RANGE.
* trans-intrinsic.cc (gfc_conv_intrinsic_out_of_range): Generate
inline expansion of runtime code for OUT_OF_RANGE.
(gfc_conv_intrinsic_function): Use it.

gcc/testsuite/ChangeLog:

* gfortran.dg/ieee/out_of_range.f90: New test.
* gfortran.dg/out_of_range_1.f90: New test.
* gfortran.dg/out_of_range_2.f90: New test.
* gfortran.dg/out_of_range_3.f90: New test.

Alpha: Fix a block move pessimisation with zero-extension after LDWU

For the BWX case we have a pessimisation in `alpha_expand_block_move'
for HImode loads where we place the data loaded into a HImode register
as well, therefore losing information that indeed the data loaded has
already been zero-extended to the full DImode width of the register.
Later on when we store this data in QImode quantities into an unaligned
destination, we zero-extend it again for the purpose of right-shifting,
such as with the test case included producing code at `-O2' as follows:

ldah $2,unaligned_src_hi($29) !gprelhigh
lda $1,unaligned_src_hi($2) !gprellow
ldwu $6,unaligned_src_hi($2) !gprellow
ldwu $5,2($1)
ldwu $4,4($1)
bis $31,$31,$31
zapnot $6,3,$3 # Redundant!
ldbu $7,6($1)
zapnot $5,3,$2 # Redundant!
stb $6,0($16)
zapnot $4,3,$1 # Redundant!
stb $5,2($16)
srl $3,8,$3
stb $4,4($16)
srl $2,8,$2
stb $3,1($16)
srl $1,8,$1
stb $2,3($16)
stb $1,5($16)
stb $7,6($16)

The non-BWX case is unaffected, because there we use byte insertion, so
we don't care that data is held in a HImode register.

Address this by making the holding RTX a HImode subreg of the original
DImode register, which the RTL passes can then see through and eliminate
the zero-extension where otherwise required, resulting in this shortened
code:

ldah $2,unaligned_src_hi($29) !gprelhigh
lda $1,unaligned_src_hi($2) !gprellow
ldwu $4,unaligned_src_hi($2) !gprellow
ldwu $3,2($1)
ldwu $2,4($1)
bis $31,$31,$31
srl $4,8,$6
ldbu $1,6($1)
srl $3,8,$5
stb $4,0($16)
stb $6,1($16)
srl $2,8,$4
stb $3,2($16)
stb $5,3($16)
stb $2,4($16)
stb $4,5($16)
stb $1,6($16)

While at it reformat the enclosing do-while statement according to the
GNU Coding Standards, observing that in this case it does not obfuscate
the change owing to the odd original indentation.

gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode
subreg of a DImode register to hold data from an aligned HImode
load.

Alpha: Optimize block moves coming from longword-aligned source

Now that we have proper alignment determination for block moves in place
the case of copying a block of longword-aligned data has become real, so
implement the merging of loaded data from pairs of SImode registers into
single DImode registers for the purpose of using with unaligned stores
efficiently, as suggested by a comment in `alpha_expand_block_move' and
discard the comment. Provide test cases accordingly.

gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Merge loaded
data from pairs of SImode registers into single DImode registers
if to be used with unaligned stores.

gcc/testsuite/
* gcc.target/alpha/memcpy-si-aligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src-bwx.c: New file.

Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse

Contrary to user documentation the `-mbwx', `-mcix', `-mfix', `-mmax'
feature options and their inverse forms are ignored whenever `-mcpu='
option is in effect, either by having been given explicitly or where
configured as the default such as with the `alphaev56-linux-gnu' target.
In the latter case there is no way to change the settings these options
are supposed to tweak other than with `-mcpu=' and the settings cannot
be individually controlled, making all the feature options permanently
inactive.

It seems a regression from commit 7816bea0e23b ("config.gcc: Reorganize
--with-cpu logic.") back in 2003, which replaced the setting of the
default feature mask with the setting of the default CPU across a few
targets, and the complementing logic in the Alpha backend wasn't updated
accordingly.

Fix this by making the individual feature options take precedence over
`-mcpu='.  Add test cases to verify this is the case, and to cover the
defaults as well for the boundary cases.

This has a drawback where the order of the options is ignored between
`-mcpu=' and these individual options, so e.g. `-mno-bwx -mcpu=ev6' will
keep the BWX feature disabled even though `-mcpu=ev6' comes later in the
command line.  This may affect some scenarios involving user overrides
such as with CFLAGS passed to `configure' and `make' invocations.  I do
believe it has been our practice anyway for more finegrained options to
override group options regardless of their relative order on the command
line and in any case using `-mcpu=ev6 -mbwx' as the override will do the
right thing if required, canceling any previous `-mno-bwx'.

This has been spotted with `alphaev56-linux-gnu' target verification and
a recently added test case:

FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\sldq_u\\s 2
FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\smskwh\\s 1
FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\smskwl\\s 1
FAIL: gcc.target/alpha/stwx0.c   -O1   scan-assembler-times \\sstq_u\\s 2

(and similarly for the remaining optimization levels covered) which this
fix has addressed.

gcc/
* config/alpha/alpha.cc (alpha_option_override): Ignore CPU
flags corresponding to features the enabling or disabling of
which has been requested with an individual feature option.

gcc/testsuite/
* gcc.target/alpha/target-bwx-1.c: New file.
* gcc.target/alpha/target-bwx-2.c: New file.
* gcc.target/alpha/target-bwx-3.c: New file.
* gcc.target/alpha/target-bwx-4.c: New file.
* gcc.target/alpha/target-cix-1.c: New file.
* gcc.target/alpha/target-cix-2.c: New file.
* gcc.target/alpha/target-cix-3.c: New file.
* gcc.target/alpha/target-cix-4.c: New file.
* gcc.target/alpha/target-fix-1.c: New file.
* gcc.target/alpha/target-fix-2.c: New file.
* gcc.target/alpha/target-fix-3.c: New file.
* gcc.target/alpha/target-fix-4.c: New file.
* gcc.target/alpha/target-max-1.c: New file.
* gcc.target/alpha/target-max-2.c: New file.
* gcc.target/alpha/target-max-3.c: New file.
* gcc.target/alpha/target-max-4.c: New file.

Alpha: Restore frame pointer last in `builtin_longjmp' [PR64242]

Add similar arrangements to `builtin_longjmp' for Alpha as with commit
71b144289c1c ("re PR middle-end/64242 (Longjmp expansion incorrect)")
and commit 511ed59d0b04 ("Fix PR64242 - Longjmp expansion incorrect"),
so as to restore the frame pointer last, so that accesses to a local
buffer supplied can still be fulfilled with memory accesses via the
original frame pointer, fixing:

FAIL: gcc.c-torture/execute/pr64242.c   -O0  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr64242.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test

and adding no regressions in `alpha-linux-gnu' testing.

gcc/
PR middle-end/64242
* config/alpha/alpha.md (`builtin_longjmp'): Restore frame
pointer last.  Add frame clobber and schedule blockage.

Alpha: Add memory clobbers to `builtin_longjmp' expansion

Add the same memory clobbers to `builtin_longjmp' for Alpha as with
commit 41439bf6a647 ("builtins.c (expand_builtin_longjmp): Added two
memory clobbers."), to prevent instructions that access memory via the
frame or stack pointer from being moved across the write to the frame
pointer.

gcc/
* config/alpha/alpha.md (builtin_longjmp): Add memory clobbers.

Fix union member access for EXEC_INQUIRE.

gcc/fortran/ChangeLog:

PR fortran/118432
* frontend-passes.cc (doloop_code): Select correct member
of co->ext.union for inquire.

More memory leak fixes

The following were found compiling SPEC CPU 2017 with valgrind.

* tree-vect-slp.cc (vect_analyze_slp): Release saved_stmts
vector.
(vect_build_slp_tree_2): Release new_oprnds_info when not
used.
(vect_analyze_slp): Release root_stmts when gcond SLP
build fails.

testsuite: The expect framework might introduce CR in output

When running tests using the "sim" config, the command is launched in
non-readonly mode and the text retrieved from the expect command will
then replace all LF with CRLF. (The problem can be found in sim_load
where it calls remote_spawn without an input file).

libstdc++-v3/ChangeLog:

* testsuite/27_io/print/1.cc: Allow both LF and CRLF in test.
* testsuite/27_io/print/3.cc: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: libstdc++: Use effective-target libatomic

Test assumes libatomic.a is always available, but for some embedded
targets, there is no libatomic.a and the test thus fail.

libstdc++-v3/ChangeLog:

* testsuite/29_atomics/atomic_float/compare_exchange_padding.cc:
Use effective-target libatomic_available.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

c-pretty-print.cc (pp_c_tree_decl_identifier): Strip private name encoding, PR118303

This is a part of PR118303. It fixes
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c (test for excess errors)
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c inbuf.data (test for warnings, line 62)
for targets where the parameter on that line is subject to
TARGET_CALLEE_COPIES being true.

c-family:
PR middle-end/118303
* c-pretty-print.cc (c_pretty_printer::primary_expression) <SSA_NAME>:
Call primary_expression for all SSA_NAME_VAR nodes and instead move the
DECL_ARTIFICIAL private name stripping to...
(pp_c_tree_decl_identifier): ...here.

final: Fix get_attr_length for asm goto [PR118411]

The problem is for inline-asm goto, the outer rtl insn type
is a jump_insn and get_attr_length does not handle ASM specially
unlike if the outer rtl insn type was just insn.

This fixes the issue by adding support for both CALL_INSN and JUMP_INSN
with asm.

OK? Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/118411

gcc/ChangeLog:

* final.cc (get_attr_length_1): Handle asm for CALL_INSN
and JUMP_INSNs.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

Daily bump.

d: Merge upstream dmd, druntime 82a5d2a7c4, phobos dbc09d823

D front-end changes:

- Import latest fixes from dmd v2.110.0-beta.1.
- Added traits `getBitfieldOffset' and `getBitfieldWidth'.
- Added trait `isCOMClass' to detect if a type is a COM class.
- Added `-fpreview=safer` which enables safety checking on
unattributed functions.

D runtime changes:

- Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

- Import latest fixes from phobos v2.110.0-beta.1.
- Added `fromHexString' and `fromHexStringAsRange' functions to
`std.digest'.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 82a5d2a7c4.
* d-lang.cc (d_handle_option): Handle new option `-fpreview=safer'.
* expr.cc (ExprVisitor::NewExp): Remove gcc_unreachable for the
generation of `_d_newThrowable'.
* lang.opt: Add -fpreview=safer.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 82a5d2a7c4.
* libdruntime/Makefile.am (DRUNTIME_DSOURCES): Add
core/internal/gc/blkcache.d, core/internal/gc/blockmeta.d.
* libdruntime/Makefile.in: Regenerate.
* src/MERGE: Merge upstream phobos dbc09d823.

libphobos: Merge upstream phobos 2a730adc0

Phobos changes:

- `std.uni' has been upgraded from Unicode 15.1.0 to 16.0.0.

libphobos/ChangeLog:

* src/MERGE: Merge upstream phobos 2a730adc0.

c++/modules: Handle chaining already-imported local types [PR114630]

In the linked testcase, an ICE occurs because when reading the
(duplicate) function definition for _M_do_parse from module Y, the local
type definitions have already been streamed from module X and setup as
regular backreferences, rather than being found with find_duplicate,
causing issues with managing DECL_CHAIN.

It is tempting to just skip setting up the DECL_CHAIN for this case.
However, for the future it would be best to ensure that the block vars
for the duplicate definition are accurate, so that we could implement
ODR checking on function definitions at some point.

So to solve this, this patch creates a copy of the streamed-in local
type and chains that; it will be discarded along with the rest of the
duplicate function after we've finished processing.

A couple of suggested implementations from the discussion on the PR that
don't work:

- Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
  doesn't handle the case where type definitions are followed by regular
  local variables, since those won't have been imported as separate
  backreferences and so the chains will diverge.

- Correcting the purviewness of GMF template instantiations to force Y
  to emit copies of the local types rather than backreferences into X is
  insufficient, as it's still possible that the local types got streamed
  in a separate cluster to the function definition, and so will be again
  referred to via regular backreferences when importing.

- Likewise, preventing the emission of function definitions where an
  import has already provided that same definition also is insufficient,
  for much the same reason.

PR c++/114630

gcc/cp/ChangeLog:

* module.cc (trees_in::core_vals) <BLOCK>: Chain a new node if
DECL_CHAIN already is set.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114630.h: New test.
* g++.dg/modules/pr114630_a.C: New test.
* g++.dg/modules/pr114630_b.C: New test.
* g++.dg/modules/pr114630_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>

Fortran: Fix location_t in gfc_get_extern_function_decl; support 'omp dispatch interop'

The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now does.

Additionally, it adds support for the 'interop' clause of OpenMP's
'dispatch' directive. As the argument order matters,
gfc_match_omp_variable_list gained a 'reverse_order' flag to use the
same order as the C/C++ parser.

gcc/fortran/ChangeLog:

* gfortran.h: Add OMP_LIST_INTEROP to the unnamed OMP_LIST_ enum.
* openmp.cc (gfc_match_omp_variable_list): Add reverse_order
boolean argument, defaulting to false.
(enum omp_mask2, OMP_DISPATCH_CLAUSES): Add OMP_CLAUSE_INTEROP.
(gfc_match_omp_clauses, resolve_omp_clauses): Handle dispatch's
'interop' clause.
* trans-decl.cc (gfc_get_extern_function_decl): Use sym->declared_at
instead input_location as DECL_SOURCE_LOCATION.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_INTEROP.

gcc/testsuite/ChangeLog:

* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Update
xfail'ed 'dg-bogus' for the better 'declared here' location.
* gfortran.dg/gomp/dispatch-11.f90: New test.
* gfortran.dg/gomp/dispatch-12.f90: New test.

Fortran: Fix error recovery for bad component arrayspecs [PR108434]

2025-01-11 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran/
PR fortran/108434
* class.cc (generate_finalization_wrapper): To avoid memory
leaks from callocs, return immediately if the derived type
error flag is set.
* decl.cc (build_struct): If the declaration of a derived type
or class component does not have a deferred arrayspec, correct,
set the error flag of the derived type and emit an immediate
error.

gcc/testsuite/
PR fortran/108434
* gfortran.dg/pr108434.f90 : Add tests from comment 1.

c++: modules and function attributes

30_threads/stop_token/stop_source/109339.cc was failing because we weren't
representing attribute access on the METHOD_TYPE for _Stop_state_ref.

The modules code expected attributes to appear on tt_variant_type and not
on tt_derived_type, but that's backwards since build_type_attribute_variant
gives a type with attributes its own TYPE_MAIN_VARIANT.

gcc/cp/ChangeLog:

* module.cc (trees_out::type_node): Write attributes for
tt_derived_type, not tt_variant_type.
(trees_in::tree_node): Likewise for reading.

gcc/testsuite/ChangeLog:

* g++.dg/modules/attrib-2_a.C: New test.
* g++.dg/modules/attrib-2_b.C: New test.

c++: modules and class attributes

std/time/traits/is_clock.cc was getting a warning about applying the
deprecated attribute to a variant of auto_ptr, which was wrong because it's
on the primary type. This turned out to be because we were ignoring the
attributes on the definition of auto_ptr because the forward declaration in
unique_ptr.h has no attributes. We need to merge attributes as usual in a
redeclaration.

gcc/cp/ChangeLog:

* module.cc (trees_in::decl_value): Merge attributes.

gcc/testsuite/ChangeLog:

* g++.dg/modules/attrib-1_a.C: New test.
* g++.dg/modules/attrib-1_b.C: New test.

LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d

Generate 0x1010 instead of 0x1010000>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.

gcc/ChangeLog:

* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/imm-load.c: Not generate ">>".

d: Merge dmd, druntime 2b89c2909d, phobos bdedad3bf

D front-end changes:

        - Import latest fixes from dmd v2.110.0-beta.1.

D runtime changes:

        - Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

        - Import latest fixes from phobos v2.110.0-beta.1.
- Added `popGrapheme' function to `std.uni'.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 2b89c2909d.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/basicmangle.o to
d/mangle-basic.o, d/cppmangle.o to d/mangle-cpp.o, and d/dmangle.o to
d/mangle-package.o.
(d/mangle-%.o): New rule.
* d-builtins.cc (maybe_set_builtin_1): Update for new front-end
interface.
* d-diagnostic.cc (verrorReport): Likewise.
(verrorReportSupplemental): Likewise.
* d-frontend.cc (getTypeInfoType): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Likewise.
* d-target.cc (TargetC::contributesToAggregateAlignment): New.
* d-tree.h (create_typeinfo): Adjust prototype.
* decl.cc (layout_struct_initializer): Update for new front-end
interface.
* typeinfo.cc (create_typeinfo): Remove generate parameter.
* types.cc (layout_aggregate_members): Update for new front-end
interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 2b89c2909d.
* src/MERGE: Merge upstream phobos bdedad3bf.

Use relations when simplifying MIN and MAX.

Query for known relations between the operands, and pass that to
fold_range to help simplify MIN and MAX relations.
Make it type agnostic as well.

Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix).

PR tree-optimization/88575
gcc/
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query
relation between op0 and op1 and utilize it.
(simplify_using_ranges::simplify): Do not eliminate float checks.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-27.c: Disable VRP.
* gcc.dg/tree-ssa/minmax-27e.c: New.
* gcc.dg/tree-ssa/minmax-27f.c: New.
* gcc.dg/tree-ssa/minmax-28.c: Disable VRP.
* gcc.dg/tree-ssa/minmax-28e.c: New.
* gcc.dg/tree-ssa/minmax-28f.c: New.

Daily bump.

d: Merge dmd, druntime 4ccb01fde5, phobos eab6595ad

D front-end changes:

- Added pragma for ImportC to allow setting `nothrow', `@nogc'
or `pure'.
- Mixin templates can now use assignment syntax.

D runtime changes:

- Removed `ThreadBase.criticalRegionLock' from `core.thread'.
- Added `expect', `[un]likely', `trap' to `core.builtins'.

Phobos changes:

- Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 4ccb01fde5.
* Make-lang.in (D_FRONTEND_OBJS): Rename d/foreachvar.o to
d/visitor-foreachvar.o, d/visitor.o to d/visitor-package.o, and
d/statement_rewrite_walker.o to d/visitor-statement_rewrite_walker.o.
(D_FRONTEND_OBJS): Rename
d/{parsetime,permissive,postorder,transitive}visitor.o to
d/visitor-{parsetime,permissive,postorder,transitive}.o.
(D_FRONTEND_OBJS): Remove d/sapply.o.
(d.tags): Add dmd/common/*.h.
(d/visitor-%.o:): New rule.
* d-codegen.cc (get_frameinfo): Update for new front-end interface.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 4ccb01fde5.
* src/MERGE: Merge upstream phobos eab6595ad.

d: Merge dmd, druntime 6884b433d2, phobos 48d581a1f

D front-end changes:

- It's now deprecated to declare `auto ref' parameters without
  putting those two keywords next to each other.
        - An error is now given for case fallthough for multivalued
  cases.
        - An error is now given for constructors with field destructors
  with stricter attributes.
        - An error is now issued for `in'/`out' contracts of `nothrow'
  functions that may throw.
- `auto ref' can now be applied to local, static, extern, and
  global variables.

D runtime changes:

        - Import latest fixes from druntime v2.110.0-beta.1.

Phobos changes:

        - Import latest fixes from phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 6884b433d2.
* d-builtins.cc (build_frontend_type): Update for new front-end
interface.
(d_build_builtins_module): Likewise.
(matches_builtin_type): Likewise.
(covariant_with_builtin_type_p): Likewise.
* d-codegen.cc (lower_struct_comparison): Likewise.
(call_side_effect_free_p): Likewise.
* d-compiler.cc (Compiler::paintAsType): Likewise.
* d-convert.cc (convert_expr): Likewise.
(convert_for_assignment): Likewise.
* d-target.cc (Target::isVectorTypeSupported): Likewise.
(Target::isVectorOpSupported): Likewise.
(Target::isReturnOnStack): Likewise.
* decl.cc (get_symbol_decl): Likewise.
* expr.cc (build_return_dtor): Likewise.
* imports.cc (class ImportVisitor): Likewise.
* toir.cc (class IRVisitor): Likewise.
* types.cc (class TypeVisitor): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 6884b433d2.
* src/MERGE: Merge upstream phobos 48d581a1f.

vect: Also cost gconds for scalar [PR118211]

Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.

This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop.cc (vect_compute_single_scalar_iteration_cost):
Don't skip over gconds.

vect: Ensure we add vector skip guard even when versioning for aliasing [PR118211]

This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard.  Specifically
in the case where the loop niters are unknown at compile time we used to
check:

  !LOOP_REQUIRES_VERSIONING (loop_vinfo)

but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling.  I think
this condition should instead be checking specifically if we aren't
versioning for alignment.

As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.

With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
condition to only omit the edge if we're versioning for
alignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/vect/vect-early-break_130.c: New test.

vect: Fix dominators when adding a guard to skip the vector loop [PR118211]

The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance. This patch fixes that.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
dominators of nodes that were dominated by the prolog skip block
after inserting vector skip edge. Initialize prolog variable to
NULL to avoid bogus -Wmaybe-uninitialized during bootstrap.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* g++.dg/vect/vect-early-break_6.cc: New test.

Co-Authored-By: Alex Coplan <alex.coplan@arm.com>

vect: Don't guard scalar epilogue for inverted loops [PR118211]

For loops with LOOP_VINFO_EARLY_BREAKS_VECT_PEELED we should always
enter the scalar epilogue, so avoid emitting a guard on entry to the
epilogue.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Avoid emitting an
epilogue guard for inverted early-exit loops.

vect: Force alignment peeling to vectorize more early break loops [PR118211]

This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.

To make this work for VLA architectures we have to allow compile-time
non-constant target alignments. We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.

gcc/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Set need_peeling_for_alignment flag on read DRs instead of
failing vectorization. Punt on gathers.
(dr_misalignment): Handle non-constant target alignments.
(vect_compute_data_ref_alignment): If need_peeling_for_alignment
flag is set on the DR, then override the target alignment chosen
by the preferred_vector_alignment hook to choose a safe
alignment.
(vect_supportable_dr_alignment): Override
support_vector_misalignment hook if need_peeling_for_alignment
is set on the DR: in this case we must return
dr_unaligned_unsupported in order to force peeling.
* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
peeling by a compile-time non-constant amount.
* tree-vectorizer.h (dr_vec_info): Add new flag
need_peeling_for_alignment.

gcc/testsuite/ChangeLog:

PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/tree-ssa/cunroll-13.c: Don't vectorize.
* gcc.dg/tree-ssa/cunroll-14.c: Likewise.
* gcc.dg/unroll-6.c: Likewise.
* gcc.dg/tree-ssa/gen-vect-28.c: Likewise.
* gcc.dg/vect/vect-104.c: Expect to vectorize.
* gcc.dg/vect/vect-early-break_108-pr113588.c: Likewise.
* gcc.dg/vect/vect-early-break_109-pr113588.c: Likewise.
* gcc.dg/vect/vect-early-break_110-pr113467.c: Likewise.
* gcc.dg/vect/vect-early-break_3.c: Likewise.
* gcc.dg/vect/vect-early-break_65.c: Likewise.
* gcc.dg/vect/vect-early-break_8.c: Likewise.
* gfortran.dg/vect/vect-5.f90: Likewise.
* gfortran.dg/vect/vect-8.f90: Likewise.
* gcc.dg/vect/vect-switch-search-line-fast.c:

Co-Authored-By: Tamar Christina <tamar.christina@arm.com>

AArch64: correct Cortex-X4 MIDR

The Parts Num field for the MIDR for Cortex-X4 is wrong. It's currently the
parts number for a Cortex-A720 (which does have the right number).

The correct number can be found in the Cortex-X4 Technical Reference Manual [1]
on page 382 in Issue Number 5.

[1] https://developer.arm.com/documentation/102484/latest/

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Fix cortex-x4 parts
num.

d: Merge dmd, druntime 34875cd6e1, phobos ebd24da8a

D front-end changes:

        - Import dmd v2.110.0-beta.1.
        - `ref' can now be applied to local, static, extern, and global
  variables.

D runtime changes:

        - Import druntime v2.110.0-beta.1.

Phobos changes:

        - Import phobos v2.110.0-beta.1.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 34875cd6e1.
* dmd/VERSION: Bump version to v2.110.0-beta.1.
* Make-lang.in (D_FRONTEND_OBJS): Add d/deps.o, d/timetrace.o.
* decl.cc (class DeclVisitor): Update for new front-end interface.
* expr.cc (class ExprVisitor): Likewise
* typeinfo.cc (check_typeinfo_type): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 34875cd6e1.
* src/MERGE: Merge upstream phobos ebd24da8a.

libstdc++: Fix unused parameter warnings in <bits/atomic_futex.h>

This fixes warnings like the following during bootstrap:

sparc-sun-solaris2.11/libstdc++-v3/include/bits/atomic_futex.h:324:53: warning: unused parameter ‘__mo’ [-Wunused-parameter]
324 | _M_load_when_equal(unsigned __val, memory_order __mo)
| ~~~~~~~~~~~~~^~~~

libstdc++-v3/ChangeLog:

* include/bits/atomic_futex.h (__atomic_futex_unsigned): Remove
names of unused parameters in non-futex implementation.

c++: add fixed test [PR118391]

Fixed by r15-6740.

PR c++/118391

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-uneval20.C: New test.

libatomic: Cleanup AArch64 ifunc selection

Simplify and cleanup ifunc selection logic. Since LRCPC3 does
not imply LSE2, has_rcpc3() should also check LSE2 is enabled.

Passes regress and bootstrap, OK for commit?

libatomic:
* config/linux/aarch64/host-config.h (has_lse2): Cleanup.
(has_lse128): Likewise.
(has_rcpc3): Add early check for LSE2.

testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test

Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall,
a pattern is needed that uses PUSH/BL/POP sequence instead of a single
B instruction to reuse an already existing function in the compile unit.

gcc/testsuite/ChangeLog:

* gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Do not call cp_parser_omp_dispatch directly in cp_parser_pragma

This is a followup to
ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args.

The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch
with cp_parser_omp_construct and check context.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/dispatch-8.C: New test.

c++: Fix ICE with invalid defaulted operator <=> [PR118387]

In the following testcase there are 2 issues, one is that B doesn't
have operator<=> and the other is that A's operator<=> has int return
type, i.e. not the standard comparison category.
Because of the int return type, retcat is cc_last; when we first
try to synthetize it, it is therefore with tentative false and complain
tf_none, we find that B doesn't have operator<=> and because retcat isn't
tc_last, don't try to search for other operators in genericize_spaceship.
And then mark the operator deleted.
When trying to explain the use of the deleted operator, tentative is still
false, but complain is tf_error_or_warning.
do_one_comp will first do:
  tree comp = build_new_op (loc, code, flags, lhs, rhs,
                            NULL_TREE, NULL_TREE, &overload,
                            tentative ? tf_none : complain);
and because complain isn't tf_none, it will actually diagnose the bug
already, but then (tentative || complain) is true and we call
genericize_spaceship, which has
  if (tag == cc_last && is_auto (type))
    {
...
    }

  gcc_checking_assert (tag < cc_last);
and because tag is cc_last and type isn't auto, we just ICE on that
assertion.

The patch fixes it by returning error_mark_node from genericize_spaceship
instead of failing the assertion.

Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement.  That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
?

2025-01-10  Jakub Jelinek  <jakub@redhat.com>

PR c++/118387
* method.cc (genericize_spaceship): For tag == cc_last if
type is not auto just return error_mark_node instead of failing
checking assertion.

* g++.dg/cpp2a/spaceship-synth17.C: New test.

c++: modules and DECL_REPLACEABLE_P

We need to remember that the ::operator new is replaceable to avoid a bogus
error about __builtin_operator_new finding a non-replaceable function.

This affected __get_temporary_buffer in stl_tempbuf.h.

gcc/cp/ChangeLog:

* module.cc (trees_out::core_bools): Write replaceable_operator.
(trees_in::core_bools): Read it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/operator-2_a.C: New test.
* g++.dg/modules/operator-2_b.C: New test.

Fix some memory leaks

The following fixes memory leaks found compiling SPEC CPU 2017 with
valgrind.

* df-core.cc (rest_of_handle_df_finish): Release dflow for
problems without free function (like LR).
* gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc):
Release loop_bbs on all exits.
* tree-vectorizer.h (supportable_indirect_convert_operation): Change.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for
converts.
(supportable_indirect_convert_operation): Get a reference to
the output vector of converts.

[PR118017][LRA]: Fix test for i686

My previous patch for PR118017 contains a test which fails on i686. The patch fixes this.

gcc/testsuite/ChangeLog:

PR target/118017
* gcc.target/i386/pr118017.c: Check target int128.

arm: [MVE intrinsics] Fix tuples field name (PR 118332)

The previous fix only worked for C, for C++ we need to add more
information to the underlying type so that
finish_class_member_access_expr accepts it.

We use the same logic as in aarch64's register_tuple_type for AdvSIMD
tuples.

This patch makes gcc.target/arm/mve/intrinsics/pr118332.c pass in C++
mode.

gcc/ChangeLog:

PR target/118332
* config/arm/arm-mve-builtins.cc (wrap_type_in_struct): Delete.
(register_type_decl): Delete.
(register_builtin_tuple_types): Use
lang_hooks.types.simulate_record_decl.

Fix bootstrap on !HARDREG_PRE_REGNOS targets

Pushed as obvious.

* gcse.cc (pass_hardreg_pre::gate): Wrap possibly unused
fun argument.

rtl-optimization/117467 - limit ext-dce memory use

The following puts in a hard limit on ext-dce because it might end
up requiring memory on the order of the number of basic blocks
times the number of pseudo registers. The limiting follows what
GCSE based passes do and thus I re-use --param max-gcse-memory here.

This doesn't in any way address the implementation issues of the pass,
but it reduces the memory-use when compiling the
module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB.

PR rtl-optimization/117467
PR rtl-optimization/117934
* ext-dce.cc (ext_dce_execute): Do nothing if a memory
allocation estimate exceeds what is allowed by
--param max-gcse-memory.

c++: ICE with pack indexing and partial inst [PR117937]

Here we ICE in expand_expr_real_1:

      if (exp)
        {
          tree context = decl_function_context (exp);
          gcc_assert (SCOPE_FILE_SCOPE_P (context)
                      || context == current_function_decl

on something like this test:

  void
  f (auto... args)
  {
    [&]<size_t... i>(seq<i...>) {
g(args...[i]...);
    }(seq<0>());
  }

because while current_function_decl is:

  f<int>(int)::<lambda(seq<i ...>)> [with long unsigned int ...i = {0}]

(correct), context is:

  f<int>(int)::<lambda(seq<i ...>)>

which is only the partial instantiation.

I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it.  The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator().  So we should let retrieve_local_specialization
find the right args#0.

PR c++/117937

gcc/cp/ChangeLog:

* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
PACK_EXPANSION_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/pack-indexing13.C: New test.
* g++.dg/cpp26/pack-indexing14.C: New test.

s390: Add expander for uaddc/usubc optabs

gcc/ChangeLog:

* config/s390/s390-protos.h (s390_emit_compare): Add mode
parameter for the resulting RTX.
* config/s390/s390.cc (s390_emit_compare): Dito.
(s390_emit_compare_and_swap): Change.
(s390_expand_vec_strlen): Change.
(s390_expand_cs_hqi): Change.
(s390_expand_split_stack_prologue): Change.
* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
(add<mode>3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub<mode>3_borrow_cc): Renamed to ...
(sub<mode>3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add<mode>3_alc_carry1_cc): Renamed to ...
(add<mode>3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub<mode>3_slb_borrow1_cc): New.
(uaddc<mode>5): New.
(usubc<mode>5): New.

gcc/testsuite/ChangeLog:

* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/uaddc-3.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
* gcc.target/s390/usubc-3.c: New test.

docs: Document new hardreg PRE pass

gcc/ChangeLog:

* doc/passes.texi: Document hardreg PRE pass.

Add new hardreg PRE pass

This pass is used to optimise assignments to the FPMR register in
aarch64.  I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.

Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later.  This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.

Any uses of the hardreg in debug insns will be deleted.  We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).

gcc/ChangeLog:

* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
* gcse.cc (doing_hardreg_pre_p): New global variable.
(do_load_motion): New boolean check.
(current_hardreg_regno): New global variable.
(compute_local_properties): Unset transp for hardreg clobbers.
(prune_hardreg_uses): New function.
(want_to_gcse_p): Use different checks for hardreg PRE.
(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
check for hardreg clobbers.
(record_last_mem_set_info): Skip for hardreg PRE.
(compute_pre_data): Prune hardreg uses from transp bitmap.
(pre_expr_reaches_here_p_work): Add sentence to comment.
(insert_insn_start_basic_block): New functions.
(pre_edge_insert): Don't add hardreg sets to predecessor block.
(pre_delete): Use hardreg for the reaching reg.
(reset_hardreg_debug_uses): New function.
(pre_gcse): For hardreg PRE, reset debug uses and don't insert
copies.
(one_pre_gcse_pass): Disable load motion for hardreg PRE.
(execute_hardreg_pre): New.
(class pass_hardreg_pre): New.
(pass_hardreg_pre::gate): New.
(make_pass_hardreg_pre): New.
* passes.def (pass_hardreg_pre): New pass.
* tree-pass.h (make_pass_hardreg_pre): New.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/acle/fpmr-1.c: New test.
* gcc.target/aarch64/acle/fpmr-2.c: New test.
* gcc.target/aarch64/acle/fpmr-3.c: New test.
* gcc.target/aarch64/acle/fpmr-4.c: New test.

Disable a broken multiversioning optimisation

This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:

1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.

2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time.  A correct would need to
verify that:
- if the current caller version were selected at runtime, then the
   chosen callee version would be eligible for selection.
- if any higher priority callee version were selected at runtime, then
   a higher priority caller version would have been eligble for
   selection (and hence the current caller version wouldn't have been
   selected).

The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.

Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage.  However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.

Since this behaviour has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behaviour on those targets,
unless any maintainer for those targets disagrees.

gcc/ChangeLog:

* multiple_target.cc
(redirect_to_specific_clone): Assert that "target" attribute is
used for FMV before checking it.
(ipa_target_clone): Skip redirect_to_specific_clone on some
targets.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/mv-pragma.C: New test.

docs: Add new AArch64 flags

gcc/ChangeLog:

* doc/invoke.texi: Add new AArch64 flags.

aarch64: Add new +xs flag

GCC does not emit tlbi instructions, so this only affects the flags
passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add XS.
* config/aarch64/aarch64-option-extensions.def (XS): New flag.

aarch64: Add new +wfxt flag

GCC does not currently emit the wfet or wfit instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_7A): Add WFXT.
* config/aarch64/aarch64-option-extensions.def (WFXT): New flag.

aarch64: Add new +rcpc2 flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_4A): Add RCPC2.
* config/aarch64/aarch64-option-extensions.def
(RCPC2): New flag.
(RCPC3): Add RCPC2 dependency.
* config/aarch64/aarch64.h (TARGET_RCPC2): Use new flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add rcpc2 to
expected feature string instead of rcpc.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +flagm2 flag

GCC does not currently emit the axflag or xaflag instructions, so this
primarily affects the flags passed through to the assembler.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FLAGM2.
* config/aarch64/aarch64-option-extensions.def (FLAGM2): New flag.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add flagm2 to
expected feature string instead of flagm.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +frintts flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS
* config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag.
* config/aarch64/aarch64.h (TARGET_FRINT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for frintts intrinsics.
* config/aarch64/arm_neon.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add frintts to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +jscvt flag

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT.
* config/aarch64/aarch64-option-extensions.def (JSCVT): New flag.
* config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_21.c: Add jscvt to
expected feature string.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.

aarch64: Add new +fcma flag

This includes +fcma as a dependency of +sve, and means that we can
finally support fcma intrinsics on a64fx.

Also add fcma to the Features list in several cpunative testcases that
incorrectly included sve without fcma.

gcc/ChangeLog:

* config/aarch64/aarch64-arches.def (V8_3A): Add FCMA.
* config/aarch64/aarch64-option-extensions.def (FCMA): New flag.
(SVE): Add FCMA dependency.
* config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag.
* config/aarch64/arm_neon.h: Use new flag for fcma intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/info_15: Add fcma to Features.
* gcc.target/aarch64/cpunative/info_16: Ditto.
* gcc.target/aarch64/cpunative/info_17: Ditto.
* gcc.target/aarch64/cpunative/info_8: Ditto.
* gcc.target/aarch64/cpunative/info_9: Ditto.

aarch64: Use PAUTH instead of V8_3A in some places

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_expand_epilogue): Use TARGET_PAUTH.
* config/aarch64/aarch64.md: Update comment.

c: Fix up expr location for __builtin_stdc_rotate_* [PR118376]

Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
case (the other __builtin_stdc_* cases already have it), which means
the locations in expr are uninitialized, sometimes causing ICEs in linemap
code, at other times just valgrind errors about uninitialized var uses.

2025-01-10 Jakub Jelinek <jakub@redhat.com>

PR c/118376
* c-parser.cc (c_parser_postfix_expression): Call
set_c_expr_source_range before break in the __builtin_stdc_rotate_*
case.

* gcc.dg/pr118376.c: New test.

rtl: Remove invalid compare simplification [PR117186]

g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
added code to treat:

  (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))

as a nop.  This PR shows that that isn't always correct.
The compare in the set above is between two 0/1 booleans (at least
on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
produced the incoming (reg:CC cc) is unconstrained; it could be between
arbitrary integers, or even floats.  The fold is therefore replacing a
cc that is valid for both signed and unsigned comparisons with one that
is only known to be valid for signed comparisons.

  (gt (compare (gt cc 0) (lt cc 0) 0)

does simplify to:

  (gt cc 0)

but:

  (gtu (compare (gt cc 0) (lt cc 0) 0)

does not simplify to:

  (gtu cc 0)

The optimisation didn't come with a testcase, but it was added for
i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
as it once did, since it's now conditional on -minline-all-stringops.
But the patch is almost 25 years old, so whatever the original
motivation was, it seems likely that other things now rely on it.

It therefore seems better to try to preserve the optimisation on rtl
rather than get rid of it.  To do that, we need to look at how the
result of the outer compare is used.  We'd therefore be looking at four
instructions (the gt, the lt, the compare, and the use of the compare),
but combine already allows that for 3-instruction combinations thanks
to:

  /* If the source is a COMPARE, look for the use of the comparison result
     and try to simplify it unless we already have used undobuf.other_insn.  */

When applied to boolean inputs, a comparison operator is
effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
simplify_logical_relational_operation already had code to simplify
logical operators between two comparison results, but:

* It only handled IOR, which doesn't cover all the cases needed here.
  The others are easily added.

* It treated comparisons of integers as having an ORDERED/UNORDERED result.
  Therefore:

  * it would not treat "true for LT + EQ + GT" as "always true" for
    comparisons between integers, because the mask excluded the UNORDERED
    condition.

  * it would try to convert "true for LT + GT" into LTGT even for comparisons
    between integers.  To prevent an ICE later, the code used:

       /* Many comparison codes are only valid for certain mode classes.  */
       if (!comparison_code_valid_for_mode (code, mode))
         return 0;

    However, this used the wrong mode, since "mode" is here the integer
    result of the comparisons (and the mode of the IOR), not the mode of
    the things being compared.  Thus the effect was to reject all
    floating-point-only codes, even when comparing floats.

  I think instead the code should detect whether the comparison is between
  integer values and remove UNORDERED from consideration if so.  It then
  always produces a valid comparison (or an always true/false result),
  and so comparison_code_valid_for_mode is not needed.  In particular,
  "true for LT + GT" becomes NE for comparisons between integers but
  remains LTGT for comparisons between floats.

* There was a missing check for whether the comparison inputs had
  side effects.

While there, it also seemed worth extending
simplify_logical_relational_operation to unsigned comparisons, since
that makes the testing easier.

As far as that testing goes: the patch exhaustively tests all
combinations of integer comparisons in:

  (cmp1 (cmp2 X Y) (cmp3 X Y))

for the 10 integer comparisons, giving 1000 fold attempts in total.
It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
on the result of the fold, giving 9 checks per fold, or 9000 in total.
That's probably more than is typical for self-tests, but it seems to
complete in neglible time, even for -O0 builds.

gcc/
PR rtl-optimization/117186
* rtl.h (simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.
* simplify-rtx.cc (unsigned_comparison_to_mask): New function.
(mask_to_unsigned_comparison): Likewise.
(comparison_code_valid_for_mode): Delete.
(simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.  Handle AND and XOR.  Handle unsigned
comparisons.  Handle always-false results.  Ignore the low bit
of the mask if the operands are always ordered and remove the
then-redundant check of comparison_code_valid_for_mode.  Check
for side-effects in the operands before simplifying them away.
(simplify_context::simplify_binary_operation_1): Remove
simplification of (compare (gt ...) (lt ...)) and instead...
(simplify_context::simplify_relational_operation_1): ...handle
comparisons of comparisons here.
(test_comparisons): New function.
(test_scalar_ops): Call it.

gcc/testsuite/
PR rtl-optimization/117186
* gcc.dg/torture/pr117186.c: New test.
* gcc.target/aarch64/pr117186.c: Likewise.

[ifcombine] drop other misuses of uniform_integer_cst_p

As Jakub pointed out in PR118206, the use of uniform_integer_cst_p in
ifcombine makes no sense, we're not dealing with vectors. Indeed,
I've been misunderstanding and misusing it since I cut&pasted it from
some preexisting match predicate in earlier version of the ifcombine
field-merge patch.

for gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Drop misuses of
uniform_integer_cst_p.
(fold_truth_andor_for_ifcombine): Likewise.

[ifcombine] fix mask variable test to match use [PR118344]

There was a cut&pasto in the rr_and_mask's adjustment to match the
combined type: the test on whether there was a mask already was
testing the wrong variable, and then it might crash or otherwise fail
accessing an undefined mask.  This only hit with checking enabled,
and rarely at that.

for  gcc/ChangeLog

PR tree-optimization/118344
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Fix typo in
rr_and_mask's type adjustment test.

for  gcc/testsuite/ChangeLog

PR tree-optimization/118344
* gcc.dg/field-merge-19.c: New.

[ifcombine] reuse left-hand mask to decode right-hand xor operand

If fold_truth_andor_for_ifcombine applies a mask to an xor, say
because the result of the xor is compared with a power of two [minus
one], we have to apply the same mask when processing both the left-
and right-hand xor paths for the transformation to be sound.  Arrange
for decode_field_reference to propagate the incoming mask along with
the expression to the right-hand operand.

Don't require the right-hand xor operand to be a constant, that was a
cut&pasto.

for  gcc/ChangeLog

* gimple-fold.cc (decode_field_reference): Add xor_pand_mask.
Propagate pand_mask to the right-hand xor operand.  Don't
require the right-hand xor operand to be a constant.
(fold_truth_andor_for_ifcombine): Pass right-hand mask when
appropriate.

[ifcombine] adjust for narrowing converts before shifts [PR118206]

A narrowing conversion and a shift both drop bits from the loaded
value, but we need to take into account which one comes first to get
the right number of bits and mask.

Fold when applying masks to parts, comparing the parts, and combining
the results, in the odd chance either mask happens to be zero.

for gcc/ChangeLog

PR tree-optimization/118206
* gimple-fold.cc (decode_field_reference): Account for upper
bits dropped by narrowing conversions whether before or after
a right shift.
(fold_truth_andor_for_ifcombine): Fold masks, compares, and
combined results.

for gcc/testsuite/ChangeLog

PR tree-optimization/118206
* gcc.dg/field-merge-18.c: New.

testsuite: generalized field-merge tests for <32-bit int [PR118025]

Explicitly convert constants to the desired types, so as to not elicit
warnings about implicit truncations, nor execution errors, on targets
whose ints are narrower than 32 bits.

for gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-1.c: Convert constants to desired types.
* gcc.dg/field-merge-3.c: Likewise.
* gcc.dg/field-merge-4.c: Likewise.
* gcc.dg/field-merge-5.c: Likewise.
* gcc.dg/field-merge-11.c: Likewise.
* gcc.dg/field-merge-17.c: Don't mess with padding bits.

testsuite: generalize ifcombine field-merge tests [PR118025]

A number of tests that check for specific ifcombine transformations
fail on AVR and PRU targets, whose type sizes and alignments aren't
conducive of the expected transformations.  Adjust the expectations.

Most execution tests should run successfully regardless of the
transformations, but a few that could conceivably fail if short and
char have the same bit width now check for that and bypass the tests
that would fail.

Conversely, one test that had such a runtime test, but that would work
regardless, no longer has that runtime test, and its types are
narrowed so that the transformations on 32-bit targets are more likely
to be the same as those that used to take place on 64-bit targets.
This latter change is somewhat obviated by a separate patch, but I've
left it in place anyway.

for  gcc/testsuite/ChangeLog

PR testsuite/118025
* gcc.dg/field-merge-1.c: Skip BIT_FIELD_REF counting on AVR and PRU.
* gcc.dg/field-merge-3.c: Bypass the test if short doesn't have the
expected size.
* gcc.dg/field-merge-8.c: Likewise.
* gcc.dg/field-merge-9.c: Likewise.  Skip optimization counting on
AVR and PRU.
* gcc.dg/field-merge-13.c: Skip optimization counting on AVR and PRU.
* gcc.dg/field-merge-15.c: Likewise.
* gcc.dg/field-merge-17.c: Likewise.
* gcc.dg/field-merge-16.c: Likewise.  Drop runtime bypass.  Use
smaller types.
* gcc.dg/field-merge-14.c: Add comments.

ifcombine field-merge: improve handling of dwords

On 32-bit hosts, data types with 64-bit alignment aren't getting
treated as desired by ifcombine field-merging: we limit the choice of
modes at BITS_PER_WORD sizes, but when deciding the boundary for a
split, we'd limit the choice only by the alignment, so we wouldn't
even consider a split at an odd 32-bit boundary.  Fix that by limiting
the boundary choice by word choice as well.

Now, this would still leave misaligned 64-bit fields in 64-bit-aligned
data structures unhandled by ifcombine on 32-bit hosts.  We already
need to loading them as double words, and if they're not byte-aligned,
the code gets really ugly, but ifcombine could improve it if it allows
double-word loads as a last resort.  I've added that.

for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit
boundary choice by word size as well.  Try aligned double-word
loads as a last resort.

for  gcc/testsuite/ChangeLog

* gcc.dg/field-merge-17.c: New.

ipa-cp: Fold-convert values when necessary (PR 118138)

PR 118138 and quite a few duplicates that it has acquired in a short
time show that even though we are careful to make sure we do not loose
any bits when newly allowing type conversions in jump-functions, we
still need to perform the fold conversions during IPA constant
propagation and not just at the end in order to properly perform
sign-extensions or zero-extensions as appropriate.

This patch does just that, changing a safety predicate we already use
at the appropriate places to return the necessary type.

gcc/ChangeLog:

2025-01-03 Martin Jambor <mjambor@suse.cz>

PR ipa/118138
* ipa-cp.cc (ipacp_value_safe_for_type): Return the appropriate
type instead of a bool, accept NULL_TREE VALUEs.
(propagate_vals_across_arith_jfunc): Use the new returned value of
ipacp_value_safe_for_type.
(propagate_vals_across_ancestor): Likewise.
(propagate_scalar_across_jump_function): Likewise.

gcc/testsuite/ChangeLog:

2025-01-03 Martin Jambor <mjambor@suse.cz>

PR ipa/118138
* gcc.dg/ipa/pr118138.c: New test.