Gaius Mulley [Mon, 13 Jan 2025 14:40:43 +0000 (14:40 +0000)]
PR modula2/118453: Subranges types do not use virtual tokens during construction
P2SymBuild.mod.BuildSubrange does not use a virtual token and therefore
any error message containing a subrange type produces poor location carots.
This patch rewrites BuildSubrange and the buildError4 procedure in
M2Check.mod (which is only called when there is a formal/actual parameter
mismatch). buildError4 now issues a sub error for the formal and actual
type declaration highlighing the type mismatch.
gcc/m2/ChangeLog:
PR modula2/118453
* gm2-compiler/M2Check.mod (buildError4): Call MetaError1
for the actual and formal parameter type.
* gm2-compiler/P2Build.bnf (SubrangeType): Construct a virtual
token containing the subrange type declaration.
(PrefixedSubrangeType): Ditto.
* gm2-compiler/P2SymBuild.def (BuildSubrange): Add tok parameter.
* gm2-compiler/P2SymBuild.mod (BuildSubrange): Use tok parameter,
rather than the token at the start of the subrange.
gcc/testsuite/ChangeLog:
PR modula2/118453
* gm2/pim/fail/badbecomes2.mod: New test.
* gm2/pim/fail/badparamset1.mod: New test.
* gm2/pim/fail/badparamset2.mod: New test.
* gm2/pim/fail/badsyntaxset1.mod: New test.
This resurrects a patch from a bit over 2 years ago that I never wrapped up.
IIRC, I ended up up catching covid, then in the hospital for an unrelated issue
and it just got dropped on the floor in the insanity.
The basic idea here is to help postreload-cse eliminate more const/copies by
recording a small set of conditional equivalences (as Richi said in 2022,
"Ick").
It was originally to help eliminate an unnecessary constant load I saw in
coremark, but as seen in BZ107455 the same issues show up in real code as well.
Bootstrapped and regression tested on x86-64, also been through multiple spins
in my tester.
Changes since v2:
- Simplified logic for blocks to examine
- Remove redundant tests when filtering blocks to examine
- Remove bogus check which only allowed reg->reg copies
Changes since v1:
Richard B and Richard S both had good comments last time around and their
requests are reflected in this update:
- Use rtx_equal_p rather than pointer equality
- Restrict to register "destinations"
- Restrict to integer modes
- Adjust entry block handling
My own wider scale testing resulted in a few more changes.
- Robustify extracting the (set (pc) ... ), which then required ...
- Handle if src/dst are clobbered by the conditional branch
- Fix logic error causing too many equivalences to be recorded
PR rtl-optimization/107455
gcc/
* postreload.cc (reload_cse_regs_1): Take advantage of conditional
equivalences.
gcc/testsuite
* gcc.target/riscv/pr107455-1.c: New test.
* gcc.target/riscv/pr107455-2.c: New test.
Alexandre Oliva [Mon, 13 Jan 2025 13:49:51 +0000 (10:49 -0300)]
[ifcombine] propagate signbit mask to XOR right-hand operand
If a single-bit bitfield takes up the sign bit of a storage unit,
comparing the corresponding bitfield between two objects loads the
storage units, XORs them, converts the result to signed char, and
compares it with zero: ((signed char)(a.<byte> ^ c.<byte>) >= 0).
fold_truth_andor_for_ifcombine recognizes the compare with zero as a
sign bit test, then it decomposes the XOR into an equality test.
The problem is that, after this decomposition, that figures out the
width of the accessed fields, we apply the sign bit mask to the
left-hand operand of the compare, but we failed to also apply it to
the right-hand operand when both were taken from the same XOR.
This patch fixes that.
for gcc/ChangeLog
PR tree-optimization/118409
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Apply the
signbit mask to the right-hand XOR operand too.
Jakub Jelinek [Mon, 13 Jan 2025 12:57:18 +0000 (13:57 +0100)]
expr: Fix up the divmod cost debugging note [PR115910]
Something I've noticed during working on the crc wrong-code fix.
My first version of the patch failed because of no longer matching some
expected strings in the assembly, so I had to add TDF_DETAILS debugging
into the -fdump-rtl-expand-details dump which the crc tests can use.
For PR115910 Andrew has added similar note for the division/modulo case
if it is positive and we can choose either unsigned or signed
division. The problem is that unlike most other TDF_DETAILS diagnostics,
this is not done before emitting the IL for the function, but during it.
Other messages there are prefixed with ;;, both details on what it is doing
and the GIMPLE IL for which it expands RTL, so the
;; Generating RTL for gimple basic block 4
Martin Jambor [Mon, 13 Jan 2025 12:47:27 +0000 (13:47 +0100)]
MAINTAINERS: Make contrib/check-MAINTAINERS.py happy
This commit makes the contrib/check-MAINTAINERS.py script happy about
our MAINTAINERS file. I hope that it knows best how things ought to
be and so am committing this as obvious.
ChangeLog:
2025-01-13 Martin Jambor <mjambor@suse.cz>
* MAINTAINERS: Fix the name order of the Write After Approval section.
Javier Miranda [Sat, 11 Jan 2025 17:30:42 +0000 (17:30 +0000)]
ada: Cleanup preanalysis of static expressions (part 5)
Partially revert the fix for sem_ch13.adb as it does not comply
with RM 13.14(7.2/5).
gcc/ada/ChangeLog:
* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Restore calls
to Preanalyze_Spec_Expression that were replaced by calls to
Preanalyze_And_Resolve. Add documentation.
(Check_Aspect_At_Freeze_Point): Ditto.
Pascal Obry [Fri, 10 Jan 2025 17:56:55 +0000 (18:56 +0100)]
ada: Fix relocatable DLL creation with gnatdll
gcc/ada/ChangeLog:
* mdll.adb: For the created DLL to be relocatable we do not want to use
the base file name when calling gnatdll.
* gnatdll.adb: Removes option -d which is not working anymore. And
when using a truly relocatable DLL the base-address has no real
meaning. Also reword the usage string for -d as we do not want to
specify relocatable as gnatdll can be used to create both
relocatable and non relocatable DLL.
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
build errors.
gcc/ada/ChangeLog:
* libgnat/a-strunb.ads: Remove redundant parentheses inside NOT
operators.
Javier Miranda [Fri, 10 Jan 2025 19:08:39 +0000 (19:08 +0000)]
ada: Cleanup preanalysis of static expressions (part 4)
Fix regression in the SPARK 2014 testsuite.
gcc/ada/ChangeLog:
* sem_util.adb (Build_Actual_Subtype_Of_Component): No action
under preanalysis.
* sem_ch5.adb (Set_Assignment_Type): If the right-hand side contains
target names, expansion has been disabled to prevent expansion that
might move target names out of the context of the assignment statement.
Restore temporarily the current compilation mode so that the actual
subtype can be built.
Piotr Trojanek [Wed, 8 Jan 2025 13:00:50 +0000 (14:00 +0100)]
ada: Warn about redundant parentheses inside unary operators
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning is now emitted for
unary operators as well.
gcc/ada/ChangeLog:
* par-ch4.adb (P_Factor): Warn when the operand of a unary operator
doesn't require parentheses.
Piotr Trojanek [Thu, 9 Jan 2025 23:31:11 +0000 (00:31 +0100)]
ada: Remove redundant parentheses inside unary operators in comments
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
future build errors.
gcc/ada/ChangeLog:
* libgnat/s-genbig.adb: Remove redundant parentheses in comments.
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
future build errors.
Piotr Trojanek [Tue, 7 Jan 2025 09:42:35 +0000 (10:42 +0100)]
ada: Fix spurious warning about redundant parentheses in range bound
Use the same logic for warning about redundant parentheses in lower and upper
bounds of a discrete range. This fixes a spurious warning that, if followed,
would render the code illegal.
gcc/ada/ChangeLog:
* par-ch3.adb (P_Discrete_Range): Detect redundant parentheses in the
lower bound like in the upper bound.
Gary Dismukes [Wed, 8 Jan 2025 22:51:41 +0000 (22:51 +0000)]
ada: Unbounded recursion on character aggregates with predicated component subtype
The compiler was recursing endlessly when analyzing an aggregate of
an array type whose component subtype has a static predicate and the
component expressions are static, repeatedly transforming the aggregate
first into a string literal and then back into an aggregate. This is fixed
by suppressing the transformation to a string literal in the case where
the component subtype has predicates.
gcc/ada/ChangeLog:
* sem_aggr.adb (Resolve_Aggregate): Add another condition to prevent rewriting
an aggregate whose type is an array of characters, testing for the presence of
predicates on the component type.
Piotr Trojanek [Mon, 6 Jan 2025 11:06:59 +0000 (12:06 +0100)]
ada: Simplify expansion of negative membership operator
Code cleanup; semantics is unaffected.
gcc/ada/ChangeLog:
* exp_ch4.adb: (Expand_N_Not_In): Preserve Alternatives in expanded
membership operator just like preserving Right_Opnd (though only
one of these fields is present at a time).
* par-ch4.adb (P_Membership_Test): Remove redundant setting of fields
to their default values.
Piotr Trojanek [Fri, 3 Jan 2025 15:02:01 +0000 (16:02 +0100)]
ada: Warn about redundant parentheses in upper range bounds
Fix a glitch in condition that effectively caused detection of redundant
parentheses in upper range bounds to be dead code.
gcc/ada/ChangeLog:
* par-ch3.adb (P_Discrete_Range): Replace N_Subexpr, which was catching
all subexpressions, with kinds that catch nodes that require
parentheses to become "simple expressions".
Piotr Trojanek [Thu, 2 Jan 2025 16:36:54 +0000 (17:36 +0100)]
ada: Fix parsing of raise expressions with no parens
According to Ada grammar, raise expression is an expression, but requires
parens to be a simple_expression. We wrongly classified raise expressions
as expressions, because we mishandled a global state variable in the parser.
This patch causes some illegal code to be rejected.
gcc/ada/ChangeLog:
* par-ch4.adb (P_Relation): Prevent Expr_Form to be overwritten when
parsing the raise expression itself.
(P_Simple_Expression): Fix manipulation of Expr_Form.
The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now does.
..., we're no longer using the 'dg-bogus' location informations, as pointed out
for one class of additional notes of
'gfortran.dg/goacc/routine-external-level-of-parallelism-2.f', once added in
commit 03eb779141a29f96600cd46904b88a33c4b49a66 "Add 'dg-note', 'dg-lto-note'".
Therefore, un-XFAILed 'dg-note's rather than XFAILed 'dg-bogus'es.
Michal Jires [Mon, 13 Jan 2025 00:58:41 +0000 (01:58 +0100)]
lto: Fix empty fnctl.h build error with MinGW.
MSYS2+MinGW contains headers without defining expected contents.
This fix checks that the fcntl function is actually defined.
Bootstrapped/regtested on x86_64-linux. Committed as obvious.
gcc/ChangeLog:
* lockfile.cc (LOCKFILE_USE_FCNTL): New.
(lockfile::lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::try_lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::lock_read): Use LOCKFILE_USE_FCNTL.
(lockfile::unlock): Use LOCKFILE_USE_FCNTL.
(lockfile::lockfile_supported): Use LOCKFILE_USE_FCNTL.
liuhongt [Thu, 9 Jan 2025 07:11:17 +0000 (23:11 -0800)]
Refactor ix86_expand_vecop_qihi2.
Since there's regression to use vpermq, and it's manually disabled by
!TARGET_AVX512BW. I remove the codes related to vpermq and make
ix86_expand_vecop_qihi2 only handle vpmovbw + op + vpmovwb case.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Refactor to avoid redundant TARGET_AVX512BW in many places.
Jakub Jelinek [Mon, 13 Jan 2025 00:24:53 +0000 (17:24 -0700)]
[PATCH] crc: Fix up some crc related wrong code issues [PR117997, PR118415]
Hi!
As mentioned in the second PR, using table names like
crc_table_for_crc_8_polynomial_0x12
in the user namespace is wrong, user could have defined such variables
in their code and as can be seen on the last testcase, then it just
misbehaves.
At minimum such names should start with 2 underscores, moving it into
implementation namespace, and if possible have some dot or dollar in the
name if target supports it.
I think assemble_crc_table right now always emits tables a local variables,
I really don't see what would be setting TREE_PUBLIC flag on
IDENTIFIER_NODEs.
It might be nice to share the tables between TUs in the same binary or
shared library, but it in that case should have hidden visibility if
possible, so that it isn't exported from the libraries or binaries, we don't
want the optimization to affect set of exported symbols from libraries.
And, as can be seen in the first PR, building gen_rtx_SYMBOL_REF by hand
is certainly unexpected on some targets, e.g. those which use
-fsection-anchors, so we should instead use DECL_RTL of the VAR_DECL.
For that we'd need to look it up if we haven't emitted it already, while
IDENTIFIER_NODEs can be looked up easily, I guess for the VAR_DECLs we'd
need custom hash table.
Now, all of the above (except sharing between multiple TUs) is already
implemented in output_constant_def, so I think it is much better to just
use that function.
And, if we want to share it between multiple TUs, we could extend the
SHF_MERGE usage in gcc, currently we only use it for constant pool
entries with same size as alignment, from 1 to 32 bytes, using .rodata.cstN
sections. We could just use say .rodata.cstM.N sections where M would be
alignment and N would be the entity size. We could use that for all
constant pool entries say up to 2048 bytes.
Though, as the current code doesn't share between multiple TUs, I think it
can be done incrementally (either still for GCC 15, or GCC 16+).
Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, on
aarch64 it also fixes
-FAIL: crypto/rsa
-FAIL: hash
ok for trunk?
gcc/
PR tree-optimization/117997
PR middle-end/118415
* expr.cc (assemble_crc_table): Make static, remove id argument,
use output_constant_def. Emit note if -fdump-rtl-expand-details
about which table has been emitted.
(generate_crc_table): Make static, adjust assemble_crc_table
caller, call it always.
(calculate_table_based_CRC): Make static.
* internal-fn.cc (expand_crc_optab_fn): Emit note if
-fdump-rtl-expand-details about using optab for crc. Formatting fix.
gcc/testsuite/
* gcc.dg/crc-builtin-target32.c: Add -fdump-rtl-expand-details
as dg-additional-options. Scan expand dump rather than assembly,
adjust the regexps.
* gcc.dg/crc-builtin-target64.c: Likewise.
* gcc.dg/crc-builtin-rev-target32.c: Likewise.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
* gcc.dg/pr117997.c: New test.
* gcc.dg/pr118415.c: New test.
- Import latest fixes from dmd v2.110.0-beta.1.
- The `align' attribute now allows to specify `default'
explicitly.
- Add primary expression of the form `__rvalue(expression)'
which causes `expression' to be treated as an rvalue, even if
it is an lvalue.
- Shortened method syntax can now be used in constructors.
D runtime changes:
- Import latest fixes from druntime v2.110.0-beta.1.
Phobos changes:
- Import latest fixes from phobos v2.110.0-beta.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd c57da0cf59.
* d-codegen.cc (can_elide_copy_p): New.
(d_build_call): Use it.
* d-lang.cc (d_post_options): Update for new front-end interface.
David Malcolm [Sun, 12 Jan 2025 18:46:31 +0000 (13:46 -0500)]
c: UX improvements to 'too {few,many} arguments' errors (v5) [PR118112]
Consider this case of a bad call to a callback function (perhaps
due to C23 changing the meaning of () in function decls):
struct p {
int (*bar)();
};
void baz() {
struct p q;
q.bar(1);
}
Before this patch the C frontend emits:
t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'
7 | q.bar(1);
| ^
which doesn't give the user much help in terms of knowing what
was expected, and where the relevant declaration is.
With this patch the C frontend emits:
t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'; expected 0, have 1
7 | q.bar(1);
| ^ ~
t.c:2:15: note: declared here
2 | int (*bar)();
| ^~~
(showing the expected vs actual counts, the pertinent field decl, and
underlining the first extraneous argument at the callsite)
Similarly, the patch also updates the "too few arguments" case to also
show expected vs actual counts. Doing so requires a tweak to the
wording to say "at least" for the case of variadic fns where
previously the C FE emitted e.g.:
s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'
5 | callee ();
| ^~~~~~
s.c:1:6: note: declared here
1 | void callee (const char *, ...);
| ^~~~~~
with this patch it emits:
s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'; expected at least 1, have 0
5 | callee ();
| ^~~~~~
s.c:1:6: note: declared here
1 | void callee (const char *, ...);
| ^~~~~~
gcc/c/ChangeLog:
PR c/118112
* c-typeck.cc (inform_declaration): Add "function_expr" param and
use it for cases where we couldn't show the function decl to show
field decls for callbacks.
(build_function_call_vec): Add missing auto_diagnostic_group.
Update for new param of inform_declaration.
(convert_arguments): Likewise. For the "too many arguments" case
add the expected vs actual counts to the message, and if we have
it, add the location_t of the first surplus param as a secondary
location within the diagnostic. For the "too few arguments" case,
determine the minimum number of arguments required and add the
expected vs actual counts to the message, tweaking it to "at least"
for variadic functions.
gcc/testsuite/ChangeLog:
PR c/118112
* gcc.dg/too-few-arguments.c: New test.
* gcc.dg/too-many-arguments.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Implementation of the Fortran 2018 standard intrinsic OUT_OF_RANGE, with
the GNU Fortran extension to unsigned integers.
Runtime code is fully inline expanded.
PR fortran/115788
gcc/fortran/ChangeLog:
* check.cc (gfc_check_out_of_range): Check arguments to intrinsic.
* expr.cc (free_expr0): Fix a memleak with unsigned literals.
* gfortran.h (enum gfc_isym_id): Define GFC_ISYM_OUT_OF_RANGE.
* gfortran.texi: Add OUT_OF_RANGE to list of intrinsics supporting
UNSIGNED.
* intrinsic.cc (add_functions): Add Fortran prototype. Break some
nearby lines with excessive length.
* intrinsic.h (gfc_check_out_of_range): Add prototypes.
* intrinsic.texi: Fortran documentation of OUT_OF_RANGE.
* simplify.cc (gfc_simplify_out_of_range): Compile-time simplification
of OUT_OF_RANGE.
* trans-intrinsic.cc (gfc_conv_intrinsic_out_of_range): Generate
inline expansion of runtime code for OUT_OF_RANGE.
(gfc_conv_intrinsic_function): Use it.
gcc/testsuite/ChangeLog:
* gfortran.dg/ieee/out_of_range.f90: New test.
* gfortran.dg/out_of_range_1.f90: New test.
* gfortran.dg/out_of_range_2.f90: New test.
* gfortran.dg/out_of_range_3.f90: New test.
Alpha: Fix a block move pessimisation with zero-extension after LDWU
For the BWX case we have a pessimisation in `alpha_expand_block_move'
for HImode loads where we place the data loaded into a HImode register
as well, therefore losing information that indeed the data loaded has
already been zero-extended to the full DImode width of the register.
Later on when we store this data in QImode quantities into an unaligned
destination, we zero-extend it again for the purpose of right-shifting,
such as with the test case included producing code at `-O2' as follows:
The non-BWX case is unaffected, because there we use byte insertion, so
we don't care that data is held in a HImode register.
Address this by making the holding RTX a HImode subreg of the original
DImode register, which the RTL passes can then see through and eliminate
the zero-extension where otherwise required, resulting in this shortened
code:
While at it reformat the enclosing do-while statement according to the
GNU Coding Standards, observing that in this case it does not obfuscate
the change owing to the odd original indentation.
gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode
subreg of a DImode register to hold data from an aligned HImode
load.
Alpha: Optimize block moves coming from longword-aligned source
Now that we have proper alignment determination for block moves in place
the case of copying a block of longword-aligned data has become real, so
implement the merging of loaded data from pairs of SImode registers into
single DImode registers for the purpose of using with unaligned stores
efficiently, as suggested by a comment in `alpha_expand_block_move' and
discard the comment. Provide test cases accordingly.
gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Merge loaded
data from pairs of SImode registers into single DImode registers
if to be used with unaligned stores.
gcc/testsuite/
* gcc.target/alpha/memcpy-si-aligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src-bwx.c: New file.
Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse
Contrary to user documentation the `-mbwx', `-mcix', `-mfix', `-mmax'
feature options and their inverse forms are ignored whenever `-mcpu='
option is in effect, either by having been given explicitly or where
configured as the default such as with the `alphaev56-linux-gnu' target.
In the latter case there is no way to change the settings these options
are supposed to tweak other than with `-mcpu=' and the settings cannot
be individually controlled, making all the feature options permanently
inactive.
It seems a regression from commit 7816bea0e23b ("config.gcc: Reorganize
--with-cpu logic.") back in 2003, which replaced the setting of the
default feature mask with the setting of the default CPU across a few
targets, and the complementing logic in the Alpha backend wasn't updated
accordingly.
Fix this by making the individual feature options take precedence over
`-mcpu='. Add test cases to verify this is the case, and to cover the
defaults as well for the boundary cases.
This has a drawback where the order of the options is ignored between
`-mcpu=' and these individual options, so e.g. `-mno-bwx -mcpu=ev6' will
keep the BWX feature disabled even though `-mcpu=ev6' comes later in the
command line. This may affect some scenarios involving user overrides
such as with CFLAGS passed to `configure' and `make' invocations. I do
believe it has been our practice anyway for more finegrained options to
override group options regardless of their relative order on the command
line and in any case using `-mcpu=ev6 -mbwx' as the override will do the
right thing if required, canceling any previous `-mno-bwx'.
This has been spotted with `alphaev56-linux-gnu' target verification and
a recently added test case:
(and similarly for the remaining optimization levels covered) which this
fix has addressed.
gcc/
* config/alpha/alpha.cc (alpha_option_override): Ignore CPU
flags corresponding to features the enabling or disabling of
which has been requested with an individual feature option.
gcc/testsuite/
* gcc.target/alpha/target-bwx-1.c: New file.
* gcc.target/alpha/target-bwx-2.c: New file.
* gcc.target/alpha/target-bwx-3.c: New file.
* gcc.target/alpha/target-bwx-4.c: New file.
* gcc.target/alpha/target-cix-1.c: New file.
* gcc.target/alpha/target-cix-2.c: New file.
* gcc.target/alpha/target-cix-3.c: New file.
* gcc.target/alpha/target-cix-4.c: New file.
* gcc.target/alpha/target-fix-1.c: New file.
* gcc.target/alpha/target-fix-2.c: New file.
* gcc.target/alpha/target-fix-3.c: New file.
* gcc.target/alpha/target-fix-4.c: New file.
* gcc.target/alpha/target-max-1.c: New file.
* gcc.target/alpha/target-max-2.c: New file.
* gcc.target/alpha/target-max-3.c: New file.
* gcc.target/alpha/target-max-4.c: New file.
Alpha: Restore frame pointer last in `builtin_longjmp' [PR64242]
Add similar arrangements to `builtin_longjmp' for Alpha as with commit 71b144289c1c ("re PR middle-end/64242 (Longjmp expansion incorrect)")
and commit 511ed59d0b04 ("Fix PR64242 - Longjmp expansion incorrect"),
so as to restore the frame pointer last, so that accesses to a local
buffer supplied can still be fulfilled with memory accesses via the
original frame pointer, fixing:
FAIL: gcc.c-torture/execute/pr64242.c -O0 execution test
FAIL: gcc.c-torture/execute/pr64242.c -O1 execution test
FAIL: gcc.c-torture/execute/pr64242.c -O2 execution test
FAIL: gcc.c-torture/execute/pr64242.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr64242.c -Os execution test
FAIL: gcc.c-torture/execute/pr64242.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr64242.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
and adding no regressions in `alpha-linux-gnu' testing.
gcc/
PR middle-end/64242
* config/alpha/alpha.md (`builtin_longjmp'): Restore frame
pointer last. Add frame clobber and schedule blockage.
Alpha: Add memory clobbers to `builtin_longjmp' expansion
Add the same memory clobbers to `builtin_longjmp' for Alpha as with
commit 41439bf6a647 ("builtins.c (expand_builtin_longjmp): Added two
memory clobbers."), to prevent instructions that access memory via the
frame or stack pointer from being moved across the write to the frame
pointer.
testsuite: The expect framework might introduce CR in output
When running tests using the "sim" config, the command is launched in
non-readonly mode and the text retrieved from the expect command will
then replace all LF with CRLF. (The problem can be found in sim_load
where it calls remote_spawn without an input file).
libstdc++-v3/ChangeLog:
* testsuite/27_io/print/1.cc: Allow both LF and CRLF in test.
* testsuite/27_io/print/3.cc: Likewise.
c-pretty-print.cc (pp_c_tree_decl_identifier): Strip private name encoding, PR118303
This is a part of PR118303. It fixes
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c (test for excess errors)
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c inbuf.data (test for warnings, line 62)
for targets where the parameter on that line is subject to
TARGET_CALLEE_COPIES being true.
c-family:
PR middle-end/118303
* c-pretty-print.cc (c_pretty_printer::primary_expression) <SSA_NAME>:
Call primary_expression for all SSA_NAME_VAR nodes and instead move the
DECL_ARTIFICIAL private name stripping to...
(pp_c_tree_decl_identifier): ...here.
Andrew Pinski [Sat, 11 Jan 2025 04:04:09 +0000 (20:04 -0800)]
final: Fix get_attr_length for asm goto [PR118411]
The problem is for inline-asm goto, the outer rtl insn type
is a jump_insn and get_attr_length does not handle ASM specially
unlike if the outer rtl insn type was just insn.
This fixes the issue by adding support for both CALL_INSN and JUMP_INSN
with asm.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/118411
gcc/ChangeLog:
* final.cc (get_attr_length_1): Handle asm for CALL_INSN
and JUMP_INSNs.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
- Import latest fixes from dmd v2.110.0-beta.1.
- Added traits `getBitfieldOffset' and `getBitfieldWidth'.
- Added trait `isCOMClass' to detect if a type is a COM class.
- Added `-fpreview=safer` which enables safety checking on
unattributed functions.
D runtime changes:
- Import latest fixes from druntime v2.110.0-beta.1.
Phobos changes:
- Import latest fixes from phobos v2.110.0-beta.1.
- Added `fromHexString' and `fromHexStringAsRange' functions to
`std.digest'.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 82a5d2a7c4.
* d-lang.cc (d_handle_option): Handle new option `-fpreview=safer'.
* expr.cc (ExprVisitor::NewExp): Remove gcc_unreachable for the
generation of `_d_newThrowable'.
* lang.opt: Add -fpreview=safer.
Nathaniel Shead [Thu, 9 Jan 2025 14:06:37 +0000 (01:06 +1100)]
c++/modules: Handle chaining already-imported local types [PR114630]
In the linked testcase, an ICE occurs because when reading the
(duplicate) function definition for _M_do_parse from module Y, the local
type definitions have already been streamed from module X and setup as
regular backreferences, rather than being found with find_duplicate,
causing issues with managing DECL_CHAIN.
It is tempting to just skip setting up the DECL_CHAIN for this case.
However, for the future it would be best to ensure that the block vars
for the duplicate definition are accurate, so that we could implement
ODR checking on function definitions at some point.
So to solve this, this patch creates a copy of the streamed-in local
type and chains that; it will be discarded along with the rest of the
duplicate function after we've finished processing.
A couple of suggested implementations from the discussion on the PR that
don't work:
- Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
doesn't handle the case where type definitions are followed by regular
local variables, since those won't have been imported as separate
backreferences and so the chains will diverge.
- Correcting the purviewness of GMF template instantiations to force Y
to emit copies of the local types rather than backreferences into X is
insufficient, as it's still possible that the local types got streamed
in a separate cluster to the function definition, and so will be again
referred to via regular backreferences when importing.
- Likewise, preventing the emission of function definitions where an
import has already provided that same definition also is insufficient,
for much the same reason.
PR c++/114630
gcc/cp/ChangeLog:
* module.cc (trees_in::core_vals) <BLOCK>: Chain a new node if
DECL_CHAIN already is set.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114630.h: New test.
* g++.dg/modules/pr114630_a.C: New test.
* g++.dg/modules/pr114630_b.C: New test.
* g++.dg/modules/pr114630_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Tobias Burnus [Sat, 11 Jan 2025 11:54:56 +0000 (12:54 +0100)]
Fortran: Fix location_t in gfc_get_extern_function_decl; support 'omp dispatch interop'
The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now does.
Additionally, it adds support for the 'interop' clause of OpenMP's
'dispatch' directive. As the argument order matters,
gfc_match_omp_variable_list gained a 'reverse_order' flag to use the
same order as the C/C++ parser.
gcc/fortran/ChangeLog:
* gfortran.h: Add OMP_LIST_INTEROP to the unnamed OMP_LIST_ enum.
* openmp.cc (gfc_match_omp_variable_list): Add reverse_order
boolean argument, defaulting to false.
(enum omp_mask2, OMP_DISPATCH_CLAUSES): Add OMP_CLAUSE_INTEROP.
(gfc_match_omp_clauses, resolve_omp_clauses): Handle dispatch's
'interop' clause.
* trans-decl.cc (gfc_get_extern_function_decl): Use sym->declared_at
instead input_location as DECL_SOURCE_LOCATION.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_INTEROP.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Update
xfail'ed 'dg-bogus' for the better 'declared here' location.
* gfortran.dg/gomp/dispatch-11.f90: New test.
* gfortran.dg/gomp/dispatch-12.f90: New test.
Paul Thomas [Sat, 11 Jan 2025 08:23:48 +0000 (08:23 +0000)]
Fortran: Fix error recovery for bad component arrayspecs [PR108434]
2025-01-11 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran/
PR fortran/108434
* class.cc (generate_finalization_wrapper): To avoid memory
leaks from callocs, return immediately if the derived type
error flag is set.
* decl.cc (build_struct): If the declaration of a derived type
or class component does not have a deferred arrayspec, correct,
set the error flag of the derived type and emit an immediate
error.
Jason Merrill [Fri, 10 Jan 2025 23:00:20 +0000 (18:00 -0500)]
c++: modules and function attributes
30_threads/stop_token/stop_source/109339.cc was failing because we weren't
representing attribute access on the METHOD_TYPE for _Stop_state_ref.
The modules code expected attributes to appear on tt_variant_type and not
on tt_derived_type, but that's backwards since build_type_attribute_variant
gives a type with attributes its own TYPE_MAIN_VARIANT.
gcc/cp/ChangeLog:
* module.cc (trees_out::type_node): Write attributes for
tt_derived_type, not tt_variant_type.
(trees_in::tree_node): Likewise for reading.
gcc/testsuite/ChangeLog:
* g++.dg/modules/attrib-2_a.C: New test.
* g++.dg/modules/attrib-2_b.C: New test.
Jason Merrill [Sat, 23 Nov 2024 09:00:18 +0000 (10:00 +0100)]
c++: modules and class attributes
std/time/traits/is_clock.cc was getting a warning about applying the
deprecated attribute to a variant of auto_ptr, which was wrong because it's
on the primary type. This turned out to be because we were ignoring the
attributes on the definition of auto_ptr because the forward declaration in
unique_ptr.h has no attributes. We need to merge attributes as usual in a
redeclaration.
mengqinggang [Fri, 10 Jan 2025 02:27:09 +0000 (10:27 +0800)]
LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d
Generate 0x1010 instead of 0x1010000>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.
gcc/ChangeLog:
* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/imm-load.c: Not generate ">>".
Andrew MacLeod [Fri, 10 Jan 2025 18:33:01 +0000 (13:33 -0500)]
Use relations when simplifying MIN and MAX.
Query for known relations between the operands, and pass that to
fold_range to help simplify MIN and MAX relations.
Make it type agnostic as well.
Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix).
PR tree-optimization/88575
gcc/
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query
relation between op0 and op1 and utilize it.
(simplify_using_ranges::simplify): Do not eliminate float checks.
- It's now deprecated to declare `auto ref' parameters without
putting those two keywords next to each other.
- An error is now given for case fallthough for multivalued
cases.
- An error is now given for constructors with field destructors
with stricter attributes.
- An error is now issued for `in'/`out' contracts of `nothrow'
functions that may throw.
- `auto ref' can now be applied to local, static, extern, and
global variables.
D runtime changes:
- Import latest fixes from druntime v2.110.0-beta.1.
Phobos changes:
- Import latest fixes from phobos v2.110.0-beta.1.
Alex Coplan [Mon, 24 Jun 2024 12:54:48 +0000 (13:54 +0100)]
vect: Also cost gconds for scalar [PR118211]
Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.
This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.
Alex Coplan [Thu, 25 Jul 2024 16:34:05 +0000 (16:34 +0000)]
vect: Ensure we add vector skip guard even when versioning for aliasing [PR118211]
This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard. Specifically
in the case where the loop niters are unknown at compile time we used to
check:
!LOOP_REQUIRES_VERSIONING (loop_vinfo)
but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling. I think
this condition should instead be checking specifically if we aren't
versioning for alignment.
As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.
With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.
gcc/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
condition to only omit the edge if we're versioning for
alignment.
gcc/testsuite/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/vect/vect-early-break_130.c: New test.
Tamar Christina [Mon, 8 Jul 2024 11:16:11 +0000 (12:16 +0100)]
vect: Fix dominators when adding a guard to skip the vector loop [PR118211]
The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance. This patch fixes that.
gcc/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
dominators of nodes that were dominated by the prolog skip block
after inserting vector skip edge. Initialize prolog variable to
NULL to avoid bogus -Wmaybe-uninitialized during bootstrap.
gcc/testsuite/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* g++.dg/vect/vect-early-break_6.cc: New test.
Alex Coplan [Mon, 11 Mar 2024 13:09:10 +0000 (13:09 +0000)]
vect: Force alignment peeling to vectorize more early break loops [PR118211]
This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.
To make this work for VLA architectures we have to allow compile-time
non-constant target alignments. We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.
gcc/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Set need_peeling_for_alignment flag on read DRs instead of
failing vectorization. Punt on gathers.
(dr_misalignment): Handle non-constant target alignments.
(vect_compute_data_ref_alignment): If need_peeling_for_alignment
flag is set on the DR, then override the target alignment chosen
by the preferred_vector_alignment hook to choose a safe
alignment.
(vect_supportable_dr_alignment): Override
support_vector_misalignment hook if need_peeling_for_alignment
is set on the DR: in this case we must return
dr_unaligned_unsupported in order to force peeling.
* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
peeling by a compile-time non-constant amount.
* tree-vectorizer.h (dr_vec_info): Add new flag
need_peeling_for_alignment.
testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test
Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall,
a pattern is needed that uses PUSH/BL/POP sequence instead of a single
B instruction to reuse an already existing function in the compile unit.
gcc/testsuite/ChangeLog:
* gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.
Do not call cp_parser_omp_dispatch directly in cp_parser_pragma
This is a followup to ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args.
The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch
with cp_parser_omp_construct and check context.
Jakub Jelinek [Fri, 10 Jan 2025 17:42:58 +0000 (18:42 +0100)]
c++: Fix ICE with invalid defaulted operator <=> [PR118387]
In the following testcase there are 2 issues, one is that B doesn't
have operator<=> and the other is that A's operator<=> has int return
type, i.e. not the standard comparison category.
Because of the int return type, retcat is cc_last; when we first
try to synthetize it, it is therefore with tentative false and complain
tf_none, we find that B doesn't have operator<=> and because retcat isn't
tc_last, don't try to search for other operators in genericize_spaceship.
And then mark the operator deleted.
When trying to explain the use of the deleted operator, tentative is still
false, but complain is tf_error_or_warning.
do_one_comp will first do:
tree comp = build_new_op (loc, code, flags, lhs, rhs,
NULL_TREE, NULL_TREE, &overload,
tentative ? tf_none : complain);
and because complain isn't tf_none, it will actually diagnose the bug
already, but then (tentative || complain) is true and we call
genericize_spaceship, which has
if (tag == cc_last && is_auto (type))
{
...
}
gcc_checking_assert (tag < cc_last);
and because tag is cc_last and type isn't auto, we just ICE on that
assertion.
The patch fixes it by returning error_mark_node from genericize_spaceship
instead of failing the assertion.
Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement. That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
?
2025-01-10 Jakub Jelinek <jakub@redhat.com>
PR c++/118387
* method.cc (genericize_spaceship): For tag == cc_last if
type is not auto just return error_mark_node instead of failing
checking assertion.
Richard Biener [Fri, 10 Jan 2025 14:17:58 +0000 (15:17 +0100)]
Fix some memory leaks
The following fixes memory leaks found compiling SPEC CPU 2017 with
valgrind.
* df-core.cc (rest_of_handle_df_finish): Release dflow for
problems without free function (like LR).
* gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc):
Release loop_bbs on all exits.
* tree-vectorizer.h (supportable_indirect_convert_operation): Change.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for
converts.
(supportable_indirect_convert_operation): Get a reference to
the output vector of converts.
Richard Biener [Fri, 10 Jan 2025 11:30:29 +0000 (12:30 +0100)]
rtl-optimization/117467 - limit ext-dce memory use
The following puts in a hard limit on ext-dce because it might end
up requiring memory on the order of the number of basic blocks
times the number of pseudo registers. The limiting follows what
GCSE based passes do and thus I re-use --param max-gcse-memory here.
This doesn't in any way address the implementation issues of the pass,
but it reduces the memory-use when compiling the
module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB.
PR rtl-optimization/117467
PR rtl-optimization/117934
* ext-dce.cc (ext_dce_execute): Do nothing if a memory
allocation estimate exceeds what is allowed by
--param max-gcse-memory.
f<int>(int)::<lambda(seq<i ...>)> [with long unsigned int ...i = {0}]
(correct), context is:
f<int>(int)::<lambda(seq<i ...>)>
which is only the partial instantiation.
I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it. The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator(). So we should let retrieve_local_specialization
find the right args#0.
PR c++/117937
gcc/cp/ChangeLog:
* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
PACK_EXPANSION_P.
gcc/testsuite/ChangeLog:
* g++.dg/cpp26/pack-indexing13.C: New test.
* g++.dg/cpp26/pack-indexing14.C: New test.
* config/s390/s390-protos.h (s390_emit_compare): Add mode
parameter for the resulting RTX.
* config/s390/s390.cc (s390_emit_compare): Dito.
(s390_emit_compare_and_swap): Change.
(s390_expand_vec_strlen): Change.
(s390_expand_cs_hqi): Change.
(s390_expand_split_stack_prologue): Change.
* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
(add<mode>3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub<mode>3_borrow_cc): Renamed to ...
(sub<mode>3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add<mode>3_alc_carry1_cc): Renamed to ...
(add<mode>3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub<mode>3_slb_borrow1_cc): New.
(uaddc<mode>5): New.
(usubc<mode>5): New.
gcc/testsuite/ChangeLog:
* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/uaddc-3.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
* gcc.target/s390/usubc-3.c: New test.
Andrew Carlotti [Tue, 15 Oct 2024 16:31:28 +0000 (17:31 +0100)]
Add new hardreg PRE pass
This pass is used to optimise assignments to the FPMR register in
aarch64. I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.
Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later. This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.
Any uses of the hardreg in debug insns will be deleted. We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).
gcc/ChangeLog:
* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
* gcse.cc (doing_hardreg_pre_p): New global variable.
(do_load_motion): New boolean check.
(current_hardreg_regno): New global variable.
(compute_local_properties): Unset transp for hardreg clobbers.
(prune_hardreg_uses): New function.
(want_to_gcse_p): Use different checks for hardreg PRE.
(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
check for hardreg clobbers.
(record_last_mem_set_info): Skip for hardreg PRE.
(compute_pre_data): Prune hardreg uses from transp bitmap.
(pre_expr_reaches_here_p_work): Add sentence to comment.
(insert_insn_start_basic_block): New functions.
(pre_edge_insert): Don't add hardreg sets to predecessor block.
(pre_delete): Use hardreg for the reaching reg.
(reset_hardreg_debug_uses): New function.
(pre_gcse): For hardreg PRE, reset debug uses and don't insert
copies.
(one_pre_gcse_pass): Disable load motion for hardreg PRE.
(execute_hardreg_pre): New.
(class pass_hardreg_pre): New.
(pass_hardreg_pre::gate): New.
(make_pass_hardreg_pre): New.
* passes.def (pass_hardreg_pre): New pass.
* tree-pass.h (make_pass_hardreg_pre): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/fpmr-1.c: New test.
* gcc.target/aarch64/acle/fpmr-2.c: New test.
* gcc.target/aarch64/acle/fpmr-3.c: New test.
* gcc.target/aarch64/acle/fpmr-4.c: New test.
Andrew Carlotti [Tue, 7 Jan 2025 18:32:23 +0000 (18:32 +0000)]
Disable a broken multiversioning optimisation
This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:
1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.
2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time. A correct would need to
verify that:
- if the current caller version were selected at runtime, then the
chosen callee version would be eligible for selection.
- if any higher priority callee version were selected at runtime, then
a higher priority caller version would have been eligble for
selection (and hence the current caller version wouldn't have been
selected).
The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.
Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage. However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.
Since this behaviour has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behaviour on those targets,
unless any maintainer for those targets disagrees.
gcc/ChangeLog:
* multiple_target.cc
(redirect_to_specific_clone): Assert that "target" attribute is
used for FMV before checking it.
(ipa_target_clone): Skip redirect_to_specific_clone on some
targets.
Andrew Carlotti [Tue, 30 Jul 2024 17:36:22 +0000 (18:36 +0100)]
aarch64: Add new +frintts flag
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_5A): Add FRINTTS
* config/aarch64/aarch64-option-extensions.def (FRINTTS): New flag.
* config/aarch64/aarch64.h (TARGET_FRINT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for frintts intrinsics.
* config/aarch64/arm_neon.h: Ditto.
Andrew Carlotti [Thu, 1 Aug 2024 10:54:41 +0000 (11:54 +0100)]
aarch64: Add new +jscvt flag
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_3A): Add JSCVT.
* config/aarch64/aarch64-option-extensions.def (JSCVT): New flag.
* config/aarch64/aarch64.h (TARGET_JSCVT): Use new flag.
* config/aarch64/arm_acle.h: Use new flag for jscvt intrinsics.
Andrew Carlotti [Thu, 1 Aug 2024 10:54:20 +0000 (11:54 +0100)]
aarch64: Add new +fcma flag
This includes +fcma as a dependency of +sve, and means that we can
finally support fcma intrinsics on a64fx.
Also add fcma to the Features list in several cpunative testcases that
incorrectly included sve without fcma.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (V8_3A): Add FCMA.
* config/aarch64/aarch64-option-extensions.def (FCMA): New flag.
(SVE): Add FCMA dependency.
* config/aarch64/aarch64.h (TARGET_COMPLEX): Use new flag.
* config/aarch64/arm_neon.h: Use new flag for fcma intrinsics.
Jakub Jelinek [Fri, 10 Jan 2025 14:07:41 +0000 (15:07 +0100)]
c: Fix up expr location for __builtin_stdc_rotate_* [PR118376]
Seems I forgot to set_c_expr_source_range for the __builtin_stdc_rotate_*
case (the other __builtin_stdc_* cases already have it), which means
the locations in expr are uninitialized, sometimes causing ICEs in linemap
code, at other times just valgrind errors about uninitialized var uses.
2025-01-10 Jakub Jelinek <jakub@redhat.com>
PR c/118376
* c-parser.cc (c_parser_postfix_expression): Call
set_c_expr_source_range before break in the __builtin_stdc_rotate_*
case.
as a nop. This PR shows that that isn't always correct.
The compare in the set above is between two 0/1 booleans (at least
on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
produced the incoming (reg:CC cc) is unconstrained; it could be between
arbitrary integers, or even floats. The fold is therefore replacing a
cc that is valid for both signed and unsigned comparisons with one that
is only known to be valid for signed comparisons.
(gt (compare (gt cc 0) (lt cc 0) 0)
does simplify to:
(gt cc 0)
but:
(gtu (compare (gt cc 0) (lt cc 0) 0)
does not simplify to:
(gtu cc 0)
The optimisation didn't come with a testcase, but it was added for
i386's cmpstrsi, now cmpstrnsi. That probably doesn't matter as much
as it once did, since it's now conditional on -minline-all-stringops.
But the patch is almost 25 years old, so whatever the original
motivation was, it seems likely that other things now rely on it.
It therefore seems better to try to preserve the optimisation on rtl
rather than get rid of it. To do that, we need to look at how the
result of the outer compare is used. We'd therefore be looking at four
instructions (the gt, the lt, the compare, and the use of the compare),
but combine already allows that for 3-instruction combinations thanks
to:
/* If the source is a COMPARE, look for the use of the comparison result
and try to simplify it unless we already have used undobuf.other_insn. */
When applied to boolean inputs, a comparison operator is
effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
simplify_logical_relational_operation already had code to simplify
logical operators between two comparison results, but:
* It only handled IOR, which doesn't cover all the cases needed here.
The others are easily added.
* It treated comparisons of integers as having an ORDERED/UNORDERED result.
Therefore:
* it would not treat "true for LT + EQ + GT" as "always true" for
comparisons between integers, because the mask excluded the UNORDERED
condition.
* it would try to convert "true for LT + GT" into LTGT even for comparisons
between integers. To prevent an ICE later, the code used:
/* Many comparison codes are only valid for certain mode classes. */
if (!comparison_code_valid_for_mode (code, mode))
return 0;
However, this used the wrong mode, since "mode" is here the integer
result of the comparisons (and the mode of the IOR), not the mode of
the things being compared. Thus the effect was to reject all
floating-point-only codes, even when comparing floats.
I think instead the code should detect whether the comparison is between
integer values and remove UNORDERED from consideration if so. It then
always produces a valid comparison (or an always true/false result),
and so comparison_code_valid_for_mode is not needed. In particular,
"true for LT + GT" becomes NE for comparisons between integers but
remains LTGT for comparisons between floats.
* There was a missing check for whether the comparison inputs had
side effects.
While there, it also seemed worth extending
simplify_logical_relational_operation to unsigned comparisons, since
that makes the testing easier.
As far as that testing goes: the patch exhaustively tests all
combinations of integer comparisons in:
(cmp1 (cmp2 X Y) (cmp3 X Y))
for the 10 integer comparisons, giving 1000 fold attempts in total.
It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
on the result of the fold, giving 9 checks per fold, or 9000 in total.
That's probably more than is typical for self-tests, but it seems to
complete in neglible time, even for -O0 builds.
gcc/
PR rtl-optimization/117186
* rtl.h (simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter.
* simplify-rtx.cc (unsigned_comparison_to_mask): New function.
(mask_to_unsigned_comparison): Likewise.
(comparison_code_valid_for_mode): Delete.
(simplify_context::simplify_logical_relational_operation): Add
an invert0_p parameter. Handle AND and XOR. Handle unsigned
comparisons. Handle always-false results. Ignore the low bit
of the mask if the operands are always ordered and remove the
then-redundant check of comparison_code_valid_for_mode. Check
for side-effects in the operands before simplifying them away.
(simplify_context::simplify_binary_operation_1): Remove
simplification of (compare (gt ...) (lt ...)) and instead...
(simplify_context::simplify_relational_operation_1): ...handle
comparisons of comparisons here.
(test_comparisons): New function.
(test_scalar_ops): Call it.
gcc/testsuite/
PR rtl-optimization/117186
* gcc.dg/torture/pr117186.c: New test.
* gcc.target/aarch64/pr117186.c: Likewise.
Alexandre Oliva [Fri, 10 Jan 2025 12:32:47 +0000 (09:32 -0300)]
[ifcombine] drop other misuses of uniform_integer_cst_p
As Jakub pointed out in PR118206, the use of uniform_integer_cst_p in
ifcombine makes no sense, we're not dealing with vectors. Indeed,
I've been misunderstanding and misusing it since I cut&pasted it from
some preexisting match predicate in earlier version of the ifcombine
field-merge patch.
for gcc/ChangeLog
* gimple-fold.cc (decode_field_reference): Drop misuses of
uniform_integer_cst_p.
(fold_truth_andor_for_ifcombine): Likewise.
Alexandre Oliva [Fri, 10 Jan 2025 12:32:43 +0000 (09:32 -0300)]
[ifcombine] fix mask variable test to match use [PR118344]
There was a cut&pasto in the rr_and_mask's adjustment to match the
combined type: the test on whether there was a mask already was
testing the wrong variable, and then it might crash or otherwise fail
accessing an undefined mask. This only hit with checking enabled,
and rarely at that.
for gcc/ChangeLog
PR tree-optimization/118344
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Fix typo in
rr_and_mask's type adjustment test.
Alexandre Oliva [Fri, 10 Jan 2025 12:32:38 +0000 (09:32 -0300)]
[ifcombine] reuse left-hand mask to decode right-hand xor operand
If fold_truth_andor_for_ifcombine applies a mask to an xor, say
because the result of the xor is compared with a power of two [minus
one], we have to apply the same mask when processing both the left-
and right-hand xor paths for the transformation to be sound. Arrange
for decode_field_reference to propagate the incoming mask along with
the expression to the right-hand operand.
Don't require the right-hand xor operand to be a constant, that was a
cut&pasto.
for gcc/ChangeLog
* gimple-fold.cc (decode_field_reference): Add xor_pand_mask.
Propagate pand_mask to the right-hand xor operand. Don't
require the right-hand xor operand to be a constant.
(fold_truth_andor_for_ifcombine): Pass right-hand mask when
appropriate.
Alexandre Oliva [Fri, 10 Jan 2025 12:32:33 +0000 (09:32 -0300)]
[ifcombine] adjust for narrowing converts before shifts [PR118206]
A narrowing conversion and a shift both drop bits from the loaded
value, but we need to take into account which one comes first to get
the right number of bits and mask.
Fold when applying masks to parts, comparing the parts, and combining
the results, in the odd chance either mask happens to be zero.
for gcc/ChangeLog
PR tree-optimization/118206
* gimple-fold.cc (decode_field_reference): Account for upper
bits dropped by narrowing conversions whether before or after
a right shift.
(fold_truth_andor_for_ifcombine): Fold masks, compares, and
combined results.
Alexandre Oliva [Fri, 10 Jan 2025 12:32:27 +0000 (09:32 -0300)]
testsuite: generalized field-merge tests for <32-bit int [PR118025]
Explicitly convert constants to the desired types, so as to not elicit
warnings about implicit truncations, nor execution errors, on targets
whose ints are narrower than 32 bits.
A number of tests that check for specific ifcombine transformations
fail on AVR and PRU targets, whose type sizes and alignments aren't
conducive of the expected transformations. Adjust the expectations.
Most execution tests should run successfully regardless of the
transformations, but a few that could conceivably fail if short and
char have the same bit width now check for that and bypass the tests
that would fail.
Conversely, one test that had such a runtime test, but that would work
regardless, no longer has that runtime test, and its types are
narrowed so that the transformations on 32-bit targets are more likely
to be the same as those that used to take place on 64-bit targets.
This latter change is somewhat obviated by a separate patch, but I've
left it in place anyway.
for gcc/testsuite/ChangeLog
PR testsuite/118025
* gcc.dg/field-merge-1.c: Skip BIT_FIELD_REF counting on AVR and PRU.
* gcc.dg/field-merge-3.c: Bypass the test if short doesn't have the
expected size.
* gcc.dg/field-merge-8.c: Likewise.
* gcc.dg/field-merge-9.c: Likewise. Skip optimization counting on
AVR and PRU.
* gcc.dg/field-merge-13.c: Skip optimization counting on AVR and PRU.
* gcc.dg/field-merge-15.c: Likewise.
* gcc.dg/field-merge-17.c: Likewise.
* gcc.dg/field-merge-16.c: Likewise. Drop runtime bypass. Use
smaller types.
* gcc.dg/field-merge-14.c: Add comments.
Alexandre Oliva [Fri, 10 Jan 2025 12:32:05 +0000 (09:32 -0300)]
ifcombine field-merge: improve handling of dwords
On 32-bit hosts, data types with 64-bit alignment aren't getting
treated as desired by ifcombine field-merging: we limit the choice of
modes at BITS_PER_WORD sizes, but when deciding the boundary for a
split, we'd limit the choice only by the alignment, so we wouldn't
even consider a split at an odd 32-bit boundary. Fix that by limiting
the boundary choice by word choice as well.
Now, this would still leave misaligned 64-bit fields in 64-bit-aligned
data structures unhandled by ifcombine on 32-bit hosts. We already
need to loading them as double words, and if they're not byte-aligned,
the code gets really ugly, but ifcombine could improve it if it allows
double-word loads as a last resort. I've added that.
for gcc/ChangeLog
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit
boundary choice by word size as well. Try aligned double-word
loads as a last resort.
Martin Jambor [Sat, 4 Jan 2025 19:40:07 +0000 (20:40 +0100)]
ipa-cp: Fold-convert values when necessary (PR 118138)
PR 118138 and quite a few duplicates that it has acquired in a short
time show that even though we are careful to make sure we do not loose
any bits when newly allowing type conversions in jump-functions, we
still need to perform the fold conversions during IPA constant
propagation and not just at the end in order to properly perform
sign-extensions or zero-extensions as appropriate.
This patch does just that, changing a safety predicate we already use
at the appropriate places to return the necessary type.
gcc/ChangeLog:
2025-01-03 Martin Jambor <mjambor@suse.cz>
PR ipa/118138
* ipa-cp.cc (ipacp_value_safe_for_type): Return the appropriate
type instead of a bool, accept NULL_TREE VALUEs.
(propagate_vals_across_arith_jfunc): Use the new returned value of
ipacp_value_safe_for_type.
(propagate_vals_across_ancestor): Likewise.
(propagate_scalar_across_jump_function): Likewise.