David Malcolm [Tue, 14 Jan 2025 00:47:25 +0000 (19:47 -0500)]
c: improve UX for -Wincompatible-pointer-types (v3) [PR116871]
PR c/116871 notes that our diagnostics about incompatible function types
could be improved.
In particular, for the case of migrating to C23 I'm seeing a lot of
build failures with signal handlers similar to this (simplified from
alsa-tools-1.2.11, envy24control/profiles.c; see rhbz#2336278):
static void print_fps (int final)
{
signal (42, signal_handler);
}
Before this patch, cc1 emits:
t2.c: In function 'print_fps':
t2.c:22:15: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
22 | signal (42, signal_handler);
| ^~~~~~~~~~~~~~
| |
| int (*)(int)
t2.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'int (*)(int)'
11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
| ~~~~~~~~~~~~~~~^~~~~~~~~
With this patch cc1 emits:
t2.c: In function 'print_fps':
t2.c:22:15: error: passing argument 2 of 'signal' from incompatible pointer type [-Wincompatible-pointer-types]
22 | signal (42, signal_handler);
| ^~~~~~~~~~~~~~
| |
| int (*)(int)
t2.c:11:57: note: expected '__sighandler_t' {aka 'void (*)(int)'} but argument is of type 'int (*)(int)'
11 | extern __sighandler_t signal (int __sig, __sighandler_t __handler)
| ~~~~~~~~~~~~~~~^~~~~~~~~
t2.c:16:19: note: 'signal_handler' declared here
16 | static RETSIGTYPE signal_handler (int sig)
| ^~~~~~~~~~~~~~
t2.c:9:16: note: '__sighandler_t' declared here
9 | typedef void (*__sighandler_t) (int);
| ^~~~~~~~~~~~~~
showing the location of the pertinent fndecl ("signal_handler"), and,
as before, the pertinent typedef.
The patch also updates the colorization in the messages to visually
link and contrast the different types and typedefs.
My hope is that this make it easier for users to decipher build failures
seen with the new C23 default.
Further improvements could be made to colorization in
convert_for_assignment, and similar improvements to C++, but I'm punting
those to GCC 16.
gcc/c/ChangeLog:
PR c/116871
* c-typeck.cc (pedwarn_permerror_init): Return bool for whether a
warning was emitted. Only call print_spelling if we warned.
(pedwarn_init): Return bool for whether a warning was emitted.
(permerror_init): Likewise.
(warning_init): Return bool for whether a
warning was emitted. Only call print_spelling if we warned.
(class pp_element_quoted_decl): New.
(maybe_inform_typedef_location): New.
(convert_for_assignment): For OPT_Wincompatible_pointer_types,
move auto_diagnostic_group to cover all cases. Use %e and
pp_element rather than %qT and tree to colorize the types.
Capture whether a warning was emitted, and, if it was,
show various notes: for a pointer to a function, show the
function decl, for typedef types, and show the decls.
gcc/testsuite/ChangeLog:
PR c/116871
* gcc.dg/c23-mismatching-fn-ptr-a52dec.c: New test.
* gcc.dg/c23-mismatching-fn-ptr-alsatools.c: New test.
* gcc.dg/c23-mismatching-fn-ptr.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Andrew Pinski [Mon, 13 Jan 2025 18:14:45 +0000 (10:14 -0800)]
c++: Add support for vec_dup to constexpr [PR118445]
With the addition of supporting operations on the SVE scalable vector types,
the vec_duplicate tree will show up in expressions and the constexpr handling
was not done for this tree code.
This is a simple fix to treat VEC_DUPLICATE like any other unary operator and allows
the constexpr-add-1.C testcase to work.
Built and tested for aarch64-linux-gnu.
PR c++/118445
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Handle VEC_DUPLICATE like
a "normal" unary operator.
(potential_constant_expression_1): Likewise.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/sve/constexpr-add-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jiufu Guo [Tue, 14 Jan 2025 00:16:16 +0000 (18:16 -0600)]
rs6000: Add clobber and guard for vsx_stxvd2x4_le_const [PR116030]
Previously, vsx_stxvd2x4_le_const_<mode> was introduced for 'split1' pass,
so it is guarded by "can_create_pseudo_p ()". While it would be possible
to match the pattern of this insn during/after RA, this insn could be
updated to make it work for split pass after RA.
And this insn would not be the best choice if the address has alignment like
"&(-16)", so "!altivec_indexed_or_indirect_operand" is added to guard this insn.
2025-01-13 Jiufu Guo <guojiufu@linux.ibm.com>
gcc/
PR target/116030
* config/rs6000/vsx.md (vsx_stxvd2x4_le_const_<mode>): Add clobber
and guard with !altivec_indexed_or_indirect_operand.
gcc/testsuite/
PR target/116030
* gcc.target/powerpc/pr116030.c: New test.
with negative step expecting wraparound semantics due to -fwrapv.
For building interleaved patterns we have an optimization that
does e.g.
{1, 209, ...} = { 1, 0, 209, 0, ...}
and
{201, 25, ...} >> 8 = { 0, 201, 0, 25, ...}
and IORs those.
The optimization only works if the lowpart bits are zero. When
overflowing e.g. with a negative step we cannot guarantee this.
This patch makes us fall back to the generic merge handling for negative
steps.
I'm not 100% certain we're good even for positive steps. If the
step or the vector length is large enough we'd still overflow and
have non-zero lower bits. I haven't seen this happen during my
testing, though and the patch doesn't make things worse, so...
Regtested on rv64gcv_zvl512b. Let's see what the CI says.
Regards
Robin
PR target/117682
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Fall back to
merging if either step is negative.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr117682.c: New test.
Robin Dapp [Mon, 13 Jan 2025 23:26:24 +0000 (16:26 -0700)]
RISC-V: testsuite: Skip test with -flto
Hi,
the zbb-rol-ror and stack_save_restore tests use the -fno-lto option and
scan the final assembly. For an invocation like -flto ... -fno-lto the
output file we scan is still something like
zbb-rol-ror-09.ltrans0.ltrans.s.
Therefore skip the tests when "-flto" is present. This gets rid
of a few UNRESOLVED tests.
Regtested on rv64gcv_zvl512b. Going to push if the CI agrees.
Xi Ruoyao [Mon, 13 Jan 2025 20:11:38 +0000 (13:11 -0700)]
RISC-V: Remove zba check in bitwise and ashift reassociation [PR 115921]
The test case
long
test (long x, long y)
{
return ((x | 0x1ff) << 3) + y;
}
is now compiled (-O2 -march=rv64g_zba) to
li a4,4096
slliw a5,a0,3
addi a4,a4,-8
or a5,a5,a4
addw a0,a5,a1
ret
Despite this check was originally intended to use zba better, now
removing it actually enables the use of zba for this test case (thanks
to late combine):
ori a5,a0,511
sh3add a0,a5,a1
ret
Obviously, bitmanip.md does not cover
(any_or (ashift (reg) (imm123)) imm) at all, and even for and it just
seems more natural splitting to (ashift (and (reg) (imm')) (imm123))
first, then let late combine to combine the outer ashift and the plus.
I've not found any test case regressed by the removal.
And "make check-gcc RUNTESTFLAGS=riscv.exp='zba-*.c'" also reports no
failure.
gcc/ChangeLog:
PR target/115921
* config/riscv/riscv.md (<optab>_shift_reverse): Remove
check for TARGET_ZBA.
gcc/testsuite/ChangeLog:
PR target/115921
* gcc.target/riscv/zba-shNadd-08.c: New test.
Fix build for STORE_FLAG_VALUE<0 targets [PR118418]
In g:06c4cf398947b53b4bfc65752f9f879bb2d07924 I mishandled signed
comparisons of comparison results on STORE_FLAG_VALUE < 0 targets
(despite specifically referencing STORE_FLAG_VALUE in the commit
message). There, (lt TRUE FALSE) is true, although (ltu FALSE TRUE)
still holds.
Things get messy with vector modes, and since those weren't the focus
of the original commit, it seemed better to punt on them for now.
However, punting means that this optimisation no longer feels like
a natural tail-call operation. The patch therefore converts
"return simplify..." to the usual call-and-conditional-return pattern.
gcc/
PR target/118418
* simplify-rtx.cc (simplify_context::simplify_relational_operation_1):
Take STORE_FLAG_VALUE into account when handling signed comparisons
of comparison results.
Jin Ma [Mon, 13 Jan 2025 18:15:55 +0000 (11:15 -0700)]
RISC-V: Fix the result error caused by not updating ratio when using "use_max_sew" to merge vsetvl
When the vsetvl instructions of the two RVV instructions are merged
using "use_max_sew", it is possible to update the sew of prev if
prev.sew < next.sew, but keep the original ratio, which is obviously
wrong. when the subsequent instructions are equal to the wrong ratio,
it is possible to generate the wrong "vsetvli zero,zero" instruction,
which will lead to unknown avl.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (demand_system::use_max_sew): Also
set the ratio for PREV.
Vineet Gupta [Sat, 11 Jan 2025 19:13:19 +0000 (11:13 -0800)]
RISC-V: fix thinko in riscv_register_move_cost ()
This seeming benign mistake caused a massive SPEC2017 Cactu regression
(2.1 trillion insn to 2.5 trillion) wiping out all the gains from my
recent sched1 improvement. Thankfully the issue was trivial to fix even
if hard to isolate.
Accept commas between clauses in OpenMP declare variant
Add support to the Fortran parser for the OpenMP syntax that allows a comma
after the directive name and between clauses of declare variant. The C and C++
parsers already support this syntax so only a new test is added.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_declare_variant): Match comma after directive
name and between clauses. Emit more useful diagnostics.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/declare-variant-2.f90: Remove error test for a comma
after the directive name. Add tests for other invalid syntaxes (extra
comma and invalid clause).
* c-c++-common/gomp/adjust-args-5.c: New test.
* gfortran.dg/gomp/adjust-args-11.f90: New test.
Gaius Mulley [Mon, 13 Jan 2025 14:40:43 +0000 (14:40 +0000)]
PR modula2/118453: Subranges types do not use virtual tokens during construction
P2SymBuild.mod.BuildSubrange does not use a virtual token and therefore
any error message containing a subrange type produces poor location carots.
This patch rewrites BuildSubrange and the buildError4 procedure in
M2Check.mod (which is only called when there is a formal/actual parameter
mismatch). buildError4 now issues a sub error for the formal and actual
type declaration highlighing the type mismatch.
gcc/m2/ChangeLog:
PR modula2/118453
* gm2-compiler/M2Check.mod (buildError4): Call MetaError1
for the actual and formal parameter type.
* gm2-compiler/P2Build.bnf (SubrangeType): Construct a virtual
token containing the subrange type declaration.
(PrefixedSubrangeType): Ditto.
* gm2-compiler/P2SymBuild.def (BuildSubrange): Add tok parameter.
* gm2-compiler/P2SymBuild.mod (BuildSubrange): Use tok parameter,
rather than the token at the start of the subrange.
gcc/testsuite/ChangeLog:
PR modula2/118453
* gm2/pim/fail/badbecomes2.mod: New test.
* gm2/pim/fail/badparamset1.mod: New test.
* gm2/pim/fail/badparamset2.mod: New test.
* gm2/pim/fail/badsyntaxset1.mod: New test.
This resurrects a patch from a bit over 2 years ago that I never wrapped up.
IIRC, I ended up up catching covid, then in the hospital for an unrelated issue
and it just got dropped on the floor in the insanity.
The basic idea here is to help postreload-cse eliminate more const/copies by
recording a small set of conditional equivalences (as Richi said in 2022,
"Ick").
It was originally to help eliminate an unnecessary constant load I saw in
coremark, but as seen in BZ107455 the same issues show up in real code as well.
Bootstrapped and regression tested on x86-64, also been through multiple spins
in my tester.
Changes since v2:
- Simplified logic for blocks to examine
- Remove redundant tests when filtering blocks to examine
- Remove bogus check which only allowed reg->reg copies
Changes since v1:
Richard B and Richard S both had good comments last time around and their
requests are reflected in this update:
- Use rtx_equal_p rather than pointer equality
- Restrict to register "destinations"
- Restrict to integer modes
- Adjust entry block handling
My own wider scale testing resulted in a few more changes.
- Robustify extracting the (set (pc) ... ), which then required ...
- Handle if src/dst are clobbered by the conditional branch
- Fix logic error causing too many equivalences to be recorded
PR rtl-optimization/107455
gcc/
* postreload.cc (reload_cse_regs_1): Take advantage of conditional
equivalences.
gcc/testsuite
* gcc.target/riscv/pr107455-1.c: New test.
* gcc.target/riscv/pr107455-2.c: New test.
Alexandre Oliva [Mon, 13 Jan 2025 13:49:51 +0000 (10:49 -0300)]
[ifcombine] propagate signbit mask to XOR right-hand operand
If a single-bit bitfield takes up the sign bit of a storage unit,
comparing the corresponding bitfield between two objects loads the
storage units, XORs them, converts the result to signed char, and
compares it with zero: ((signed char)(a.<byte> ^ c.<byte>) >= 0).
fold_truth_andor_for_ifcombine recognizes the compare with zero as a
sign bit test, then it decomposes the XOR into an equality test.
The problem is that, after this decomposition, that figures out the
width of the accessed fields, we apply the sign bit mask to the
left-hand operand of the compare, but we failed to also apply it to
the right-hand operand when both were taken from the same XOR.
This patch fixes that.
for gcc/ChangeLog
PR tree-optimization/118409
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Apply the
signbit mask to the right-hand XOR operand too.
Jakub Jelinek [Mon, 13 Jan 2025 12:57:18 +0000 (13:57 +0100)]
expr: Fix up the divmod cost debugging note [PR115910]
Something I've noticed during working on the crc wrong-code fix.
My first version of the patch failed because of no longer matching some
expected strings in the assembly, so I had to add TDF_DETAILS debugging
into the -fdump-rtl-expand-details dump which the crc tests can use.
For PR115910 Andrew has added similar note for the division/modulo case
if it is positive and we can choose either unsigned or signed
division. The problem is that unlike most other TDF_DETAILS diagnostics,
this is not done before emitting the IL for the function, but during it.
Other messages there are prefixed with ;;, both details on what it is doing
and the GIMPLE IL for which it expands RTL, so the
;; Generating RTL for gimple basic block 4
Martin Jambor [Mon, 13 Jan 2025 12:47:27 +0000 (13:47 +0100)]
MAINTAINERS: Make contrib/check-MAINTAINERS.py happy
This commit makes the contrib/check-MAINTAINERS.py script happy about
our MAINTAINERS file. I hope that it knows best how things ought to
be and so am committing this as obvious.
ChangeLog:
2025-01-13 Martin Jambor <mjambor@suse.cz>
* MAINTAINERS: Fix the name order of the Write After Approval section.
Javier Miranda [Sat, 11 Jan 2025 17:30:42 +0000 (17:30 +0000)]
ada: Cleanup preanalysis of static expressions (part 5)
Partially revert the fix for sem_ch13.adb as it does not comply
with RM 13.14(7.2/5).
gcc/ada/ChangeLog:
* sem_ch13.adb (Check_Aspect_At_End_Of_Declarations): Restore calls
to Preanalyze_Spec_Expression that were replaced by calls to
Preanalyze_And_Resolve. Add documentation.
(Check_Aspect_At_Freeze_Point): Ditto.
Pascal Obry [Fri, 10 Jan 2025 17:56:55 +0000 (18:56 +0100)]
ada: Fix relocatable DLL creation with gnatdll
gcc/ada/ChangeLog:
* mdll.adb: For the created DLL to be relocatable we do not want to use
the base file name when calling gnatdll.
* gnatdll.adb: Removes option -d which is not working anymore. And
when using a truly relocatable DLL the base-address has no real
meaning. Also reword the usage string for -d as we do not want to
specify relocatable as gnatdll can be used to create both
relocatable and non relocatable DLL.
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
build errors.
gcc/ada/ChangeLog:
* libgnat/a-strunb.ads: Remove redundant parentheses inside NOT
operators.
Javier Miranda [Fri, 10 Jan 2025 19:08:39 +0000 (19:08 +0000)]
ada: Cleanup preanalysis of static expressions (part 4)
Fix regression in the SPARK 2014 testsuite.
gcc/ada/ChangeLog:
* sem_util.adb (Build_Actual_Subtype_Of_Component): No action
under preanalysis.
* sem_ch5.adb (Set_Assignment_Type): If the right-hand side contains
target names, expansion has been disabled to prevent expansion that
might move target names out of the context of the assignment statement.
Restore temporarily the current compilation mode so that the actual
subtype can be built.
Piotr Trojanek [Wed, 8 Jan 2025 13:00:50 +0000 (14:00 +0100)]
ada: Warn about redundant parentheses inside unary operators
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning is now emitted for
unary operators as well.
gcc/ada/ChangeLog:
* par-ch4.adb (P_Factor): Warn when the operand of a unary operator
doesn't require parentheses.
Piotr Trojanek [Thu, 9 Jan 2025 23:31:11 +0000 (00:31 +0100)]
ada: Remove redundant parentheses inside unary operators in comments
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
future build errors.
gcc/ada/ChangeLog:
* libgnat/s-genbig.adb: Remove redundant parentheses in comments.
GNAT already emits a style warning when redundant parentheses appear inside
logical and short-circuit operators. A similar warning will be soon emitted for
unary operators as well. This patch removes the redundant parentheses to avoid
future build errors.
Piotr Trojanek [Tue, 7 Jan 2025 09:42:35 +0000 (10:42 +0100)]
ada: Fix spurious warning about redundant parentheses in range bound
Use the same logic for warning about redundant parentheses in lower and upper
bounds of a discrete range. This fixes a spurious warning that, if followed,
would render the code illegal.
gcc/ada/ChangeLog:
* par-ch3.adb (P_Discrete_Range): Detect redundant parentheses in the
lower bound like in the upper bound.
Gary Dismukes [Wed, 8 Jan 2025 22:51:41 +0000 (22:51 +0000)]
ada: Unbounded recursion on character aggregates with predicated component subtype
The compiler was recursing endlessly when analyzing an aggregate of
an array type whose component subtype has a static predicate and the
component expressions are static, repeatedly transforming the aggregate
first into a string literal and then back into an aggregate. This is fixed
by suppressing the transformation to a string literal in the case where
the component subtype has predicates.
gcc/ada/ChangeLog:
* sem_aggr.adb (Resolve_Aggregate): Add another condition to prevent rewriting
an aggregate whose type is an array of characters, testing for the presence of
predicates on the component type.
Piotr Trojanek [Mon, 6 Jan 2025 11:06:59 +0000 (12:06 +0100)]
ada: Simplify expansion of negative membership operator
Code cleanup; semantics is unaffected.
gcc/ada/ChangeLog:
* exp_ch4.adb: (Expand_N_Not_In): Preserve Alternatives in expanded
membership operator just like preserving Right_Opnd (though only
one of these fields is present at a time).
* par-ch4.adb (P_Membership_Test): Remove redundant setting of fields
to their default values.
Piotr Trojanek [Fri, 3 Jan 2025 15:02:01 +0000 (16:02 +0100)]
ada: Warn about redundant parentheses in upper range bounds
Fix a glitch in condition that effectively caused detection of redundant
parentheses in upper range bounds to be dead code.
gcc/ada/ChangeLog:
* par-ch3.adb (P_Discrete_Range): Replace N_Subexpr, which was catching
all subexpressions, with kinds that catch nodes that require
parentheses to become "simple expressions".
Piotr Trojanek [Thu, 2 Jan 2025 16:36:54 +0000 (17:36 +0100)]
ada: Fix parsing of raise expressions with no parens
According to Ada grammar, raise expression is an expression, but requires
parens to be a simple_expression. We wrongly classified raise expressions
as expressions, because we mishandled a global state variable in the parser.
This patch causes some illegal code to be rejected.
gcc/ada/ChangeLog:
* par-ch4.adb (P_Relation): Prevent Expr_Form to be overwritten when
parsing the raise expression itself.
(P_Simple_Expression): Fix manipulation of Expr_Form.
The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now does.
..., we're no longer using the 'dg-bogus' location informations, as pointed out
for one class of additional notes of
'gfortran.dg/goacc/routine-external-level-of-parallelism-2.f', once added in
commit 03eb779141a29f96600cd46904b88a33c4b49a66 "Add 'dg-note', 'dg-lto-note'".
Therefore, un-XFAILed 'dg-note's rather than XFAILed 'dg-bogus'es.
Michal Jires [Mon, 13 Jan 2025 00:58:41 +0000 (01:58 +0100)]
lto: Fix empty fnctl.h build error with MinGW.
MSYS2+MinGW contains headers without defining expected contents.
This fix checks that the fcntl function is actually defined.
Bootstrapped/regtested on x86_64-linux. Committed as obvious.
gcc/ChangeLog:
* lockfile.cc (LOCKFILE_USE_FCNTL): New.
(lockfile::lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::try_lock_write): Use LOCKFILE_USE_FCNTL.
(lockfile::lock_read): Use LOCKFILE_USE_FCNTL.
(lockfile::unlock): Use LOCKFILE_USE_FCNTL.
(lockfile::lockfile_supported): Use LOCKFILE_USE_FCNTL.
liuhongt [Thu, 9 Jan 2025 07:11:17 +0000 (23:11 -0800)]
Refactor ix86_expand_vecop_qihi2.
Since there's regression to use vpermq, and it's manually disabled by
!TARGET_AVX512BW. I remove the codes related to vpermq and make
ix86_expand_vecop_qihi2 only handle vpmovbw + op + vpmovwb case.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Refactor to avoid redundant TARGET_AVX512BW in many places.
Jakub Jelinek [Mon, 13 Jan 2025 00:24:53 +0000 (17:24 -0700)]
[PATCH] crc: Fix up some crc related wrong code issues [PR117997, PR118415]
Hi!
As mentioned in the second PR, using table names like
crc_table_for_crc_8_polynomial_0x12
in the user namespace is wrong, user could have defined such variables
in their code and as can be seen on the last testcase, then it just
misbehaves.
At minimum such names should start with 2 underscores, moving it into
implementation namespace, and if possible have some dot or dollar in the
name if target supports it.
I think assemble_crc_table right now always emits tables a local variables,
I really don't see what would be setting TREE_PUBLIC flag on
IDENTIFIER_NODEs.
It might be nice to share the tables between TUs in the same binary or
shared library, but it in that case should have hidden visibility if
possible, so that it isn't exported from the libraries or binaries, we don't
want the optimization to affect set of exported symbols from libraries.
And, as can be seen in the first PR, building gen_rtx_SYMBOL_REF by hand
is certainly unexpected on some targets, e.g. those which use
-fsection-anchors, so we should instead use DECL_RTL of the VAR_DECL.
For that we'd need to look it up if we haven't emitted it already, while
IDENTIFIER_NODEs can be looked up easily, I guess for the VAR_DECLs we'd
need custom hash table.
Now, all of the above (except sharing between multiple TUs) is already
implemented in output_constant_def, so I think it is much better to just
use that function.
And, if we want to share it between multiple TUs, we could extend the
SHF_MERGE usage in gcc, currently we only use it for constant pool
entries with same size as alignment, from 1 to 32 bytes, using .rodata.cstN
sections. We could just use say .rodata.cstM.N sections where M would be
alignment and N would be the entity size. We could use that for all
constant pool entries say up to 2048 bytes.
Though, as the current code doesn't share between multiple TUs, I think it
can be done incrementally (either still for GCC 15, or GCC 16+).
Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux, on
aarch64 it also fixes
-FAIL: crypto/rsa
-FAIL: hash
ok for trunk?
gcc/
PR tree-optimization/117997
PR middle-end/118415
* expr.cc (assemble_crc_table): Make static, remove id argument,
use output_constant_def. Emit note if -fdump-rtl-expand-details
about which table has been emitted.
(generate_crc_table): Make static, adjust assemble_crc_table
caller, call it always.
(calculate_table_based_CRC): Make static.
* internal-fn.cc (expand_crc_optab_fn): Emit note if
-fdump-rtl-expand-details about using optab for crc. Formatting fix.
gcc/testsuite/
* gcc.dg/crc-builtin-target32.c: Add -fdump-rtl-expand-details
as dg-additional-options. Scan expand dump rather than assembly,
adjust the regexps.
* gcc.dg/crc-builtin-target64.c: Likewise.
* gcc.dg/crc-builtin-rev-target32.c: Likewise.
* gcc.dg/crc-builtin-rev-target64.c: Likewise.
* gcc.dg/pr117997.c: New test.
* gcc.dg/pr118415.c: New test.
- Import latest fixes from dmd v2.110.0-beta.1.
- The `align' attribute now allows to specify `default'
explicitly.
- Add primary expression of the form `__rvalue(expression)'
which causes `expression' to be treated as an rvalue, even if
it is an lvalue.
- Shortened method syntax can now be used in constructors.
D runtime changes:
- Import latest fixes from druntime v2.110.0-beta.1.
Phobos changes:
- Import latest fixes from phobos v2.110.0-beta.1.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd c57da0cf59.
* d-codegen.cc (can_elide_copy_p): New.
(d_build_call): Use it.
* d-lang.cc (d_post_options): Update for new front-end interface.
David Malcolm [Sun, 12 Jan 2025 18:46:31 +0000 (13:46 -0500)]
c: UX improvements to 'too {few,many} arguments' errors (v5) [PR118112]
Consider this case of a bad call to a callback function (perhaps
due to C23 changing the meaning of () in function decls):
struct p {
int (*bar)();
};
void baz() {
struct p q;
q.bar(1);
}
Before this patch the C frontend emits:
t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'
7 | q.bar(1);
| ^
which doesn't give the user much help in terms of knowing what
was expected, and where the relevant declaration is.
With this patch the C frontend emits:
t.c: In function 'baz':
t.c:7:5: error: too many arguments to function 'q.bar'; expected 0, have 1
7 | q.bar(1);
| ^ ~
t.c:2:15: note: declared here
2 | int (*bar)();
| ^~~
(showing the expected vs actual counts, the pertinent field decl, and
underlining the first extraneous argument at the callsite)
Similarly, the patch also updates the "too few arguments" case to also
show expected vs actual counts. Doing so requires a tweak to the
wording to say "at least" for the case of variadic fns where
previously the C FE emitted e.g.:
s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'
5 | callee ();
| ^~~~~~
s.c:1:6: note: declared here
1 | void callee (const char *, ...);
| ^~~~~~
with this patch it emits:
s.c: In function 'test':
s.c:5:3: error: too few arguments to function 'callee'; expected at least 1, have 0
5 | callee ();
| ^~~~~~
s.c:1:6: note: declared here
1 | void callee (const char *, ...);
| ^~~~~~
gcc/c/ChangeLog:
PR c/118112
* c-typeck.cc (inform_declaration): Add "function_expr" param and
use it for cases where we couldn't show the function decl to show
field decls for callbacks.
(build_function_call_vec): Add missing auto_diagnostic_group.
Update for new param of inform_declaration.
(convert_arguments): Likewise. For the "too many arguments" case
add the expected vs actual counts to the message, and if we have
it, add the location_t of the first surplus param as a secondary
location within the diagnostic. For the "too few arguments" case,
determine the minimum number of arguments required and add the
expected vs actual counts to the message, tweaking it to "at least"
for variadic functions.
gcc/testsuite/ChangeLog:
PR c/118112
* gcc.dg/too-few-arguments.c: New test.
* gcc.dg/too-many-arguments.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Implementation of the Fortran 2018 standard intrinsic OUT_OF_RANGE, with
the GNU Fortran extension to unsigned integers.
Runtime code is fully inline expanded.
PR fortran/115788
gcc/fortran/ChangeLog:
* check.cc (gfc_check_out_of_range): Check arguments to intrinsic.
* expr.cc (free_expr0): Fix a memleak with unsigned literals.
* gfortran.h (enum gfc_isym_id): Define GFC_ISYM_OUT_OF_RANGE.
* gfortran.texi: Add OUT_OF_RANGE to list of intrinsics supporting
UNSIGNED.
* intrinsic.cc (add_functions): Add Fortran prototype. Break some
nearby lines with excessive length.
* intrinsic.h (gfc_check_out_of_range): Add prototypes.
* intrinsic.texi: Fortran documentation of OUT_OF_RANGE.
* simplify.cc (gfc_simplify_out_of_range): Compile-time simplification
of OUT_OF_RANGE.
* trans-intrinsic.cc (gfc_conv_intrinsic_out_of_range): Generate
inline expansion of runtime code for OUT_OF_RANGE.
(gfc_conv_intrinsic_function): Use it.
gcc/testsuite/ChangeLog:
* gfortran.dg/ieee/out_of_range.f90: New test.
* gfortran.dg/out_of_range_1.f90: New test.
* gfortran.dg/out_of_range_2.f90: New test.
* gfortran.dg/out_of_range_3.f90: New test.
Alpha: Fix a block move pessimisation with zero-extension after LDWU
For the BWX case we have a pessimisation in `alpha_expand_block_move'
for HImode loads where we place the data loaded into a HImode register
as well, therefore losing information that indeed the data loaded has
already been zero-extended to the full DImode width of the register.
Later on when we store this data in QImode quantities into an unaligned
destination, we zero-extend it again for the purpose of right-shifting,
such as with the test case included producing code at `-O2' as follows:
The non-BWX case is unaffected, because there we use byte insertion, so
we don't care that data is held in a HImode register.
Address this by making the holding RTX a HImode subreg of the original
DImode register, which the RTL passes can then see through and eliminate
the zero-extension where otherwise required, resulting in this shortened
code:
While at it reformat the enclosing do-while statement according to the
GNU Coding Standards, observing that in this case it does not obfuscate
the change owing to the odd original indentation.
gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Use a HImode
subreg of a DImode register to hold data from an aligned HImode
load.
Alpha: Optimize block moves coming from longword-aligned source
Now that we have proper alignment determination for block moves in place
the case of copying a block of longword-aligned data has become real, so
implement the merging of loaded data from pairs of SImode registers into
single DImode registers for the purpose of using with unaligned stores
efficiently, as suggested by a comment in `alpha_expand_block_move' and
discard the comment. Provide test cases accordingly.
gcc/
* config/alpha/alpha.cc (alpha_expand_block_move): Merge loaded
data from pairs of SImode registers into single DImode registers
if to be used with unaligned stores.
gcc/testsuite/
* gcc.target/alpha/memcpy-si-aligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-dst.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src.c: New file.
* gcc.target/alpha/memcpy-si-unaligned-src-bwx.c: New file.
Alpha: Always respect -mbwx, -mcix, -mfix, -mmax, and their inverse
Contrary to user documentation the `-mbwx', `-mcix', `-mfix', `-mmax'
feature options and their inverse forms are ignored whenever `-mcpu='
option is in effect, either by having been given explicitly or where
configured as the default such as with the `alphaev56-linux-gnu' target.
In the latter case there is no way to change the settings these options
are supposed to tweak other than with `-mcpu=' and the settings cannot
be individually controlled, making all the feature options permanently
inactive.
It seems a regression from commit 7816bea0e23b ("config.gcc: Reorganize
--with-cpu logic.") back in 2003, which replaced the setting of the
default feature mask with the setting of the default CPU across a few
targets, and the complementing logic in the Alpha backend wasn't updated
accordingly.
Fix this by making the individual feature options take precedence over
`-mcpu='. Add test cases to verify this is the case, and to cover the
defaults as well for the boundary cases.
This has a drawback where the order of the options is ignored between
`-mcpu=' and these individual options, so e.g. `-mno-bwx -mcpu=ev6' will
keep the BWX feature disabled even though `-mcpu=ev6' comes later in the
command line. This may affect some scenarios involving user overrides
such as with CFLAGS passed to `configure' and `make' invocations. I do
believe it has been our practice anyway for more finegrained options to
override group options regardless of their relative order on the command
line and in any case using `-mcpu=ev6 -mbwx' as the override will do the
right thing if required, canceling any previous `-mno-bwx'.
This has been spotted with `alphaev56-linux-gnu' target verification and
a recently added test case:
(and similarly for the remaining optimization levels covered) which this
fix has addressed.
gcc/
* config/alpha/alpha.cc (alpha_option_override): Ignore CPU
flags corresponding to features the enabling or disabling of
which has been requested with an individual feature option.
gcc/testsuite/
* gcc.target/alpha/target-bwx-1.c: New file.
* gcc.target/alpha/target-bwx-2.c: New file.
* gcc.target/alpha/target-bwx-3.c: New file.
* gcc.target/alpha/target-bwx-4.c: New file.
* gcc.target/alpha/target-cix-1.c: New file.
* gcc.target/alpha/target-cix-2.c: New file.
* gcc.target/alpha/target-cix-3.c: New file.
* gcc.target/alpha/target-cix-4.c: New file.
* gcc.target/alpha/target-fix-1.c: New file.
* gcc.target/alpha/target-fix-2.c: New file.
* gcc.target/alpha/target-fix-3.c: New file.
* gcc.target/alpha/target-fix-4.c: New file.
* gcc.target/alpha/target-max-1.c: New file.
* gcc.target/alpha/target-max-2.c: New file.
* gcc.target/alpha/target-max-3.c: New file.
* gcc.target/alpha/target-max-4.c: New file.
Alpha: Restore frame pointer last in `builtin_longjmp' [PR64242]
Add similar arrangements to `builtin_longjmp' for Alpha as with commit 71b144289c1c ("re PR middle-end/64242 (Longjmp expansion incorrect)")
and commit 511ed59d0b04 ("Fix PR64242 - Longjmp expansion incorrect"),
so as to restore the frame pointer last, so that accesses to a local
buffer supplied can still be fulfilled with memory accesses via the
original frame pointer, fixing:
FAIL: gcc.c-torture/execute/pr64242.c -O0 execution test
FAIL: gcc.c-torture/execute/pr64242.c -O1 execution test
FAIL: gcc.c-torture/execute/pr64242.c -O2 execution test
FAIL: gcc.c-torture/execute/pr64242.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr64242.c -Os execution test
FAIL: gcc.c-torture/execute/pr64242.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr64242.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test
and adding no regressions in `alpha-linux-gnu' testing.
gcc/
PR middle-end/64242
* config/alpha/alpha.md (`builtin_longjmp'): Restore frame
pointer last. Add frame clobber and schedule blockage.
Alpha: Add memory clobbers to `builtin_longjmp' expansion
Add the same memory clobbers to `builtin_longjmp' for Alpha as with
commit 41439bf6a647 ("builtins.c (expand_builtin_longjmp): Added two
memory clobbers."), to prevent instructions that access memory via the
frame or stack pointer from being moved across the write to the frame
pointer.
testsuite: The expect framework might introduce CR in output
When running tests using the "sim" config, the command is launched in
non-readonly mode and the text retrieved from the expect command will
then replace all LF with CRLF. (The problem can be found in sim_load
where it calls remote_spawn without an input file).
libstdc++-v3/ChangeLog:
* testsuite/27_io/print/1.cc: Allow both LF and CRLF in test.
* testsuite/27_io/print/3.cc: Likewise.
c-pretty-print.cc (pp_c_tree_decl_identifier): Strip private name encoding, PR118303
This is a part of PR118303. It fixes
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c (test for excess errors)
FAIL: gcc.dg/analyzer/CVE-2005-1689-minimal.c inbuf.data (test for warnings, line 62)
for targets where the parameter on that line is subject to
TARGET_CALLEE_COPIES being true.
c-family:
PR middle-end/118303
* c-pretty-print.cc (c_pretty_printer::primary_expression) <SSA_NAME>:
Call primary_expression for all SSA_NAME_VAR nodes and instead move the
DECL_ARTIFICIAL private name stripping to...
(pp_c_tree_decl_identifier): ...here.
Andrew Pinski [Sat, 11 Jan 2025 04:04:09 +0000 (20:04 -0800)]
final: Fix get_attr_length for asm goto [PR118411]
The problem is for inline-asm goto, the outer rtl insn type
is a jump_insn and get_attr_length does not handle ASM specially
unlike if the outer rtl insn type was just insn.
This fixes the issue by adding support for both CALL_INSN and JUMP_INSN
with asm.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/118411
gcc/ChangeLog:
* final.cc (get_attr_length_1): Handle asm for CALL_INSN
and JUMP_INSNs.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
- Import latest fixes from dmd v2.110.0-beta.1.
- Added traits `getBitfieldOffset' and `getBitfieldWidth'.
- Added trait `isCOMClass' to detect if a type is a COM class.
- Added `-fpreview=safer` which enables safety checking on
unattributed functions.
D runtime changes:
- Import latest fixes from druntime v2.110.0-beta.1.
Phobos changes:
- Import latest fixes from phobos v2.110.0-beta.1.
- Added `fromHexString' and `fromHexStringAsRange' functions to
`std.digest'.
gcc/d/ChangeLog:
* dmd/MERGE: Merge upstream dmd 82a5d2a7c4.
* d-lang.cc (d_handle_option): Handle new option `-fpreview=safer'.
* expr.cc (ExprVisitor::NewExp): Remove gcc_unreachable for the
generation of `_d_newThrowable'.
* lang.opt: Add -fpreview=safer.
Nathaniel Shead [Thu, 9 Jan 2025 14:06:37 +0000 (01:06 +1100)]
c++/modules: Handle chaining already-imported local types [PR114630]
In the linked testcase, an ICE occurs because when reading the
(duplicate) function definition for _M_do_parse from module Y, the local
type definitions have already been streamed from module X and setup as
regular backreferences, rather than being found with find_duplicate,
causing issues with managing DECL_CHAIN.
It is tempting to just skip setting up the DECL_CHAIN for this case.
However, for the future it would be best to ensure that the block vars
for the duplicate definition are accurate, so that we could implement
ODR checking on function definitions at some point.
So to solve this, this patch creates a copy of the streamed-in local
type and chains that; it will be discarded along with the rest of the
duplicate function after we've finished processing.
A couple of suggested implementations from the discussion on the PR that
don't work:
- Replacing the `DECL_CHAIN` assertion with `(*chain && *chain != decl)`
doesn't handle the case where type definitions are followed by regular
local variables, since those won't have been imported as separate
backreferences and so the chains will diverge.
- Correcting the purviewness of GMF template instantiations to force Y
to emit copies of the local types rather than backreferences into X is
insufficient, as it's still possible that the local types got streamed
in a separate cluster to the function definition, and so will be again
referred to via regular backreferences when importing.
- Likewise, preventing the emission of function definitions where an
import has already provided that same definition also is insufficient,
for much the same reason.
PR c++/114630
gcc/cp/ChangeLog:
* module.cc (trees_in::core_vals) <BLOCK>: Chain a new node if
DECL_CHAIN already is set.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr114630.h: New test.
* g++.dg/modules/pr114630_a.C: New test.
* g++.dg/modules/pr114630_b.C: New test.
* g++.dg/modules/pr114630_c.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com> Reviewed-by: Patrick Palka <ppalka@redhat.com>
Tobias Burnus [Sat, 11 Jan 2025 11:54:56 +0000 (12:54 +0100)]
Fortran: Fix location_t in gfc_get_extern_function_decl; support 'omp dispatch interop'
The declaration created by gfc_get_extern_function_decl used input_location
as DECL_SOURCE_LOCATION, which gave rather odd results with 'declared here'
diagnostic. - It is much more useful to use the gfc_symbol's declated_at,
which this commit now does.
Additionally, it adds support for the 'interop' clause of OpenMP's
'dispatch' directive. As the argument order matters,
gfc_match_omp_variable_list gained a 'reverse_order' flag to use the
same order as the C/C++ parser.
gcc/fortran/ChangeLog:
* gfortran.h: Add OMP_LIST_INTEROP to the unnamed OMP_LIST_ enum.
* openmp.cc (gfc_match_omp_variable_list): Add reverse_order
boolean argument, defaulting to false.
(enum omp_mask2, OMP_DISPATCH_CLAUSES): Add OMP_CLAUSE_INTEROP.
(gfc_match_omp_clauses, resolve_omp_clauses): Handle dispatch's
'interop' clause.
* trans-decl.cc (gfc_get_extern_function_decl): Use sym->declared_at
instead input_location as DECL_SOURCE_LOCATION.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_INTEROP.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/routine-external-level-of-parallelism-2.f: Update
xfail'ed 'dg-bogus' for the better 'declared here' location.
* gfortran.dg/gomp/dispatch-11.f90: New test.
* gfortran.dg/gomp/dispatch-12.f90: New test.
Paul Thomas [Sat, 11 Jan 2025 08:23:48 +0000 (08:23 +0000)]
Fortran: Fix error recovery for bad component arrayspecs [PR108434]
2025-01-11 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran/
PR fortran/108434
* class.cc (generate_finalization_wrapper): To avoid memory
leaks from callocs, return immediately if the derived type
error flag is set.
* decl.cc (build_struct): If the declaration of a derived type
or class component does not have a deferred arrayspec, correct,
set the error flag of the derived type and emit an immediate
error.
Jason Merrill [Fri, 10 Jan 2025 23:00:20 +0000 (18:00 -0500)]
c++: modules and function attributes
30_threads/stop_token/stop_source/109339.cc was failing because we weren't
representing attribute access on the METHOD_TYPE for _Stop_state_ref.
The modules code expected attributes to appear on tt_variant_type and not
on tt_derived_type, but that's backwards since build_type_attribute_variant
gives a type with attributes its own TYPE_MAIN_VARIANT.
gcc/cp/ChangeLog:
* module.cc (trees_out::type_node): Write attributes for
tt_derived_type, not tt_variant_type.
(trees_in::tree_node): Likewise for reading.
gcc/testsuite/ChangeLog:
* g++.dg/modules/attrib-2_a.C: New test.
* g++.dg/modules/attrib-2_b.C: New test.
Jason Merrill [Sat, 23 Nov 2024 09:00:18 +0000 (10:00 +0100)]
c++: modules and class attributes
std/time/traits/is_clock.cc was getting a warning about applying the
deprecated attribute to a variant of auto_ptr, which was wrong because it's
on the primary type. This turned out to be because we were ignoring the
attributes on the definition of auto_ptr because the forward declaration in
unique_ptr.h has no attributes. We need to merge attributes as usual in a
redeclaration.
mengqinggang [Fri, 10 Jan 2025 02:27:09 +0000 (10:27 +0800)]
LoongArch: Generate the final immediate for lu12i.w, lu32i.d and lu52i.d
Generate 0x1010 instead of 0x1010000>>12 for lu12i.w. lu32i.d and lu52i.d use
the same processing.
gcc/ChangeLog:
* config/loongarch/lasx.md: Use new loongarch_output_move.
* config/loongarch/loongarch-protos.h (loongarch_output_move):
Change parameters from (rtx, rtx) to (rtx *).
* config/loongarch/loongarch.cc (loongarch_output_move):
Generate final immediate for lu12i.w and lu52i.d.
* config/loongarch/loongarch.md:
Generate final immediate for lu32i.d and lu52i.d.
* config/loongarch/lsx.md: Use new loongarch_output_move.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/imm-load.c: Not generate ">>".
Andrew MacLeod [Fri, 10 Jan 2025 18:33:01 +0000 (13:33 -0500)]
Use relations when simplifying MIN and MAX.
Query for known relations between the operands, and pass that to
fold_range to help simplify MIN and MAX relations.
Make it type agnostic as well.
Adapt testcases from DOM to EVRP (e suffix) and test floats (f suffix).
PR tree-optimization/88575
gcc/
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Query
relation between op0 and op1 and utilize it.
(simplify_using_ranges::simplify): Do not eliminate float checks.
- It's now deprecated to declare `auto ref' parameters without
putting those two keywords next to each other.
- An error is now given for case fallthough for multivalued
cases.
- An error is now given for constructors with field destructors
with stricter attributes.
- An error is now issued for `in'/`out' contracts of `nothrow'
functions that may throw.
- `auto ref' can now be applied to local, static, extern, and
global variables.
D runtime changes:
- Import latest fixes from druntime v2.110.0-beta.1.
Phobos changes:
- Import latest fixes from phobos v2.110.0-beta.1.
Alex Coplan [Mon, 24 Jun 2024 12:54:48 +0000 (13:54 +0100)]
vect: Also cost gconds for scalar [PR118211]
Currently we only cost gconds for the vector loop while we omit costing
them when analyzing the scalar loop; this unfairly penalizes the vector
loop in the case of loops with early exits.
This (together with the previous patches) enables us to vectorize
std::find with 64-bit element sizes.
Alex Coplan [Thu, 25 Jul 2024 16:34:05 +0000 (16:34 +0000)]
vect: Ensure we add vector skip guard even when versioning for aliasing [PR118211]
This fixes a latent wrong code issue whereby vect_do_peeling determined
the wrong condition for inserting the vector skip guard. Specifically
in the case where the loop niters are unknown at compile time we used to
check:
!LOOP_REQUIRES_VERSIONING (loop_vinfo)
but LOOP_REQUIRES_VERSIONING is true for loops which we have versioned
for aliasing, and that has nothing to do with prolog peeling. I think
this condition should instead be checking specifically if we aren't
versioning for alignment.
As it stands, when we version for alignment, we don't peel, so the
vector skip guard is indeed redundant in that case.
With the testcase added (reduced from the Fortran frontend) we would
version for aliasing, omit the vector skip guard, and then at runtime we
would peel sufficient iterations for alignment that there wasn't a full
vector iteration left when we entered the vector body, thus overflowing
the output buffer.
gcc/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Adjust skip_vector
condition to only omit the edge if we're versioning for
alignment.
gcc/testsuite/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* gcc.dg/vect/vect-early-break_130.c: New test.
Tamar Christina [Mon, 8 Jul 2024 11:16:11 +0000 (12:16 +0100)]
vect: Fix dominators when adding a guard to skip the vector loop [PR118211]
The alignment peeling changes exposed a latent missing dominator update
with early break vectorization, specifically when inserting the vector
skip edge, since the new edge bypasses the prolog skip block and thus
has the potential to subvert its dominance. This patch fixes that.
gcc/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-loop-manip.cc (vect_do_peeling): Update immediate
dominators of nodes that were dominated by the prolog skip block
after inserting vector skip edge. Initialize prolog variable to
NULL to avoid bogus -Wmaybe-uninitialized during bootstrap.
gcc/testsuite/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* g++.dg/vect/vect-early-break_6.cc: New test.
Alex Coplan [Mon, 11 Mar 2024 13:09:10 +0000 (13:09 +0000)]
vect: Force alignment peeling to vectorize more early break loops [PR118211]
This allows us to vectorize more loops with early exits by forcing
peeling for alignment to make sure that we're guaranteed to be able to
safely read an entire vector iteration without crossing a page boundary.
To make this work for VLA architectures we have to allow compile-time
non-constant target alignments. We also have to override the result of
the target's preferred_vector_alignment hook if it isn't a power-of-two
multiple of the TYPE_SIZE of the chosen vector type.
gcc/ChangeLog:
PR tree-optimization/118211
PR tree-optimization/116126
* tree-vect-data-refs.cc (vect_analyze_early_break_dependences):
Set need_peeling_for_alignment flag on read DRs instead of
failing vectorization. Punt on gathers.
(dr_misalignment): Handle non-constant target alignments.
(vect_compute_data_ref_alignment): If need_peeling_for_alignment
flag is set on the DR, then override the target alignment chosen
by the preferred_vector_alignment hook to choose a safe
alignment.
(vect_supportable_dr_alignment): Override
support_vector_misalignment hook if need_peeling_for_alignment
is set on the DR: in this case we must return
dr_unaligned_unsupported in order to force peeling.
* tree-vect-loop-manip.cc (vect_do_peeling): Allow prolog
peeling by a compile-time non-constant amount.
* tree-vectorizer.h (dr_vec_info): Add new flag
need_peeling_for_alignment.
testsuite: arm: Add pattern for armv8-m.base to cmse-15.c test
Since armv8-m.base uses thumb1 that does not suport sibcall/tailcall,
a pattern is needed that uses PUSH/BL/POP sequence instead of a single
B instruction to reuse an already existing function in the compile unit.
gcc/testsuite/ChangeLog:
* gcc.target/arm/cmse/cmse-15.c: Added pattern for armv8-m.base.
Do not call cp_parser_omp_dispatch directly in cp_parser_pragma
This is a followup to ed49709acda OpenMP: C++ front-end support for dispatch + adjust_args.
The call to cp_parser_omp_dispatch only belongs in cp_parser_omp_construct. In
cp_parser_pragma, handle PRAGMA_OMP_DISPATCH by calling cp_parser_omp_construct.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_pragma): Replace call to cp_parser_omp_dispatch
with cp_parser_omp_construct and check context.
Jakub Jelinek [Fri, 10 Jan 2025 17:42:58 +0000 (18:42 +0100)]
c++: Fix ICE with invalid defaulted operator <=> [PR118387]
In the following testcase there are 2 issues, one is that B doesn't
have operator<=> and the other is that A's operator<=> has int return
type, i.e. not the standard comparison category.
Because of the int return type, retcat is cc_last; when we first
try to synthetize it, it is therefore with tentative false and complain
tf_none, we find that B doesn't have operator<=> and because retcat isn't
tc_last, don't try to search for other operators in genericize_spaceship.
And then mark the operator deleted.
When trying to explain the use of the deleted operator, tentative is still
false, but complain is tf_error_or_warning.
do_one_comp will first do:
tree comp = build_new_op (loc, code, flags, lhs, rhs,
NULL_TREE, NULL_TREE, &overload,
tentative ? tf_none : complain);
and because complain isn't tf_none, it will actually diagnose the bug
already, but then (tentative || complain) is true and we call
genericize_spaceship, which has
if (tag == cc_last && is_auto (type))
{
...
}
gcc_checking_assert (tag < cc_last);
and because tag is cc_last and type isn't auto, we just ICE on that
assertion.
The patch fixes it by returning error_mark_node from genericize_spaceship
instead of failing the assertion.
Note, the PR raises another problem.
If on the same testcase the B b; line is removed, we silently synthetize
operator<=> which will crash at runtime due to returning without a return
statement. That is because the standard says that in that case
it should return static_cast<int>(std::strong_ordering::equal);
but I can't find anywhere wording which would say that if that isn't
valid, the function is deleted.
https://eel.is/c++draft/class.compare#class.spaceship-2.2
seems to talk just about cases where there are some members and their
comparison is invalid it is deleted, but here there are none and it
follows
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
So, we synthetize with tf_none, see the static_cast is invalid, don't
add error_mark_node statement silently, but as the function isn't deleted,
we just silently emit it.
Should the standard be amended to say that the operator should be deleted
even if it has no elements and the static cast from
https://eel.is/c++draft/class.compare#class.spaceship-3.sentence-2
?
2025-01-10 Jakub Jelinek <jakub@redhat.com>
PR c++/118387
* method.cc (genericize_spaceship): For tag == cc_last if
type is not auto just return error_mark_node instead of failing
checking assertion.
Richard Biener [Fri, 10 Jan 2025 14:17:58 +0000 (15:17 +0100)]
Fix some memory leaks
The following fixes memory leaks found compiling SPEC CPU 2017 with
valgrind.
* df-core.cc (rest_of_handle_df_finish): Release dflow for
problems without free function (like LR).
* gimple-crc-optimization.cc (crc_optimization::loop_may_calculate_crc):
Release loop_bbs on all exits.
* tree-vectorizer.h (supportable_indirect_convert_operation): Change.
* tree-vect-generic.cc (expand_vector_conversion): Adjust.
* tree-vect-stmts.cc (vectorizable_conversion): Use auto_vec for
converts.
(supportable_indirect_convert_operation): Get a reference to
the output vector of converts.
Richard Biener [Fri, 10 Jan 2025 11:30:29 +0000 (12:30 +0100)]
rtl-optimization/117467 - limit ext-dce memory use
The following puts in a hard limit on ext-dce because it might end
up requiring memory on the order of the number of basic blocks
times the number of pseudo registers. The limiting follows what
GCSE based passes do and thus I re-use --param max-gcse-memory here.
This doesn't in any way address the implementation issues of the pass,
but it reduces the memory-use when compiling the
module_first_rk_step_part1.F90 TU from 521.wrf_r from 25GB to 1GB.
PR rtl-optimization/117467
PR rtl-optimization/117934
* ext-dce.cc (ext_dce_execute): Do nothing if a memory
allocation estimate exceeds what is allowed by
--param max-gcse-memory.
f<int>(int)::<lambda(seq<i ...>)> [with long unsigned int ...i = {0}]
(correct), context is:
f<int>(int)::<lambda(seq<i ...>)>
which is only the partial instantiation.
I think that when tsubst_pack_index gets a partial instantiation, e.g.
{*args#0} as the pack, we should still tsubst it. The args#0's value-expr
can be __closure->__args#0 where the closure's context is the partially
instantiated operator(). So we should let retrieve_local_specialization
find the right args#0.
PR c++/117937
gcc/cp/ChangeLog:
* pt.cc (tsubst_pack_index): tsubst the pack even when it's not
PACK_EXPANSION_P.
gcc/testsuite/ChangeLog:
* g++.dg/cpp26/pack-indexing13.C: New test.
* g++.dg/cpp26/pack-indexing14.C: New test.
* config/s390/s390-protos.h (s390_emit_compare): Add mode
parameter for the resulting RTX.
* config/s390/s390.cc (s390_emit_compare): Dito.
(s390_emit_compare_and_swap): Change.
(s390_expand_vec_strlen): Change.
(s390_expand_cs_hqi): Change.
(s390_expand_split_stack_prologue): Change.
* config/s390/s390.md (*add<mode>3_carry1_cc): Renamed to ...
(add<mode>3_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*sub<mode>3_borrow_cc): Renamed to ...
(sub<mode>3_borrow_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(*add<mode>3_alc_carry1_cc): Renamed to ...
(add<mode>3_alc_carry1_cc): this and in order to use the
corresponding gen function, encode CC mode into pattern.
(sub<mode>3_slb_borrow1_cc): New.
(uaddc<mode>5): New.
(usubc<mode>5): New.
gcc/testsuite/ChangeLog:
* gcc.target/s390/uaddc-1.c: New test.
* gcc.target/s390/uaddc-2.c: New test.
* gcc.target/s390/uaddc-3.c: New test.
* gcc.target/s390/usubc-1.c: New test.
* gcc.target/s390/usubc-2.c: New test.
* gcc.target/s390/usubc-3.c: New test.
Andrew Carlotti [Tue, 15 Oct 2024 16:31:28 +0000 (17:31 +0100)]
Add new hardreg PRE pass
This pass is used to optimise assignments to the FPMR register in
aarch64. I chose to implement this as a middle-end pass because it
mostly reuses the existing RTL PRE code within gcse.cc.
Compared to RTL PRE, the key difference in this new pass is that we
insert new writes directly to the destination hardreg, instead of
writing to a new pseudo-register and copying the result later. This
requires changes to the analysis portion of the pass, because sets
cannot be moved before existing instructions that set, use or clobber
the hardreg, and the value becomes unavailable after any uses of
clobbers of the hardreg.
Any uses of the hardreg in debug insns will be deleted. We could do
better than this, but for the aarch64 fpmr I don't think we emit useful
debuginfo for deleted fp8 instructions anyway (and I don't even know if
it's possible to have a debug fpmr use when entering hardreg PRE).
gcc/ChangeLog:
* config/aarch64/aarch64.h (HARDREG_PRE_REGNOS): New macro.
* gcse.cc (doing_hardreg_pre_p): New global variable.
(do_load_motion): New boolean check.
(current_hardreg_regno): New global variable.
(compute_local_properties): Unset transp for hardreg clobbers.
(prune_hardreg_uses): New function.
(want_to_gcse_p): Use different checks for hardreg PRE.
(oprs_unchanged_p): Disable load motion for hardreg PRE pass.
(hash_scan_set): For hardreg PRE, skip non-hardreg sets and
check for hardreg clobbers.
(record_last_mem_set_info): Skip for hardreg PRE.
(compute_pre_data): Prune hardreg uses from transp bitmap.
(pre_expr_reaches_here_p_work): Add sentence to comment.
(insert_insn_start_basic_block): New functions.
(pre_edge_insert): Don't add hardreg sets to predecessor block.
(pre_delete): Use hardreg for the reaching reg.
(reset_hardreg_debug_uses): New function.
(pre_gcse): For hardreg PRE, reset debug uses and don't insert
copies.
(one_pre_gcse_pass): Disable load motion for hardreg PRE.
(execute_hardreg_pre): New.
(class pass_hardreg_pre): New.
(pass_hardreg_pre::gate): New.
(make_pass_hardreg_pre): New.
* passes.def (pass_hardreg_pre): New pass.
* tree-pass.h (make_pass_hardreg_pre): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/acle/fpmr-1.c: New test.
* gcc.target/aarch64/acle/fpmr-2.c: New test.
* gcc.target/aarch64/acle/fpmr-3.c: New test.
* gcc.target/aarch64/acle/fpmr-4.c: New test.
Andrew Carlotti [Tue, 7 Jan 2025 18:32:23 +0000 (18:32 +0000)]
Disable a broken multiversioning optimisation
This patch skips redirect_to_specific clone for aarch64 and riscv,
because the optimisation has two flaws:
1. It checks the value of the "target" attribute, even on targets that
don't use this attribute for multiversioning.
2. The algorithm used is too aggressive, and will eliminate the
indirection in some cases where the runtime choice of callee version
can't be determined statically at compile time. A correct would need to
verify that:
- if the current caller version were selected at runtime, then the
chosen callee version would be eligible for selection.
- if any higher priority callee version were selected at runtime, then
a higher priority caller version would have been eligble for
selection (and hence the current caller version wouldn't have been
selected).
The current checks only verify a more restrictive version of the first
condition, and don't check the second condition at all.
Fixing the optimisation properly would require implementing target hooks
to check for implications between version attributes, which is too
complicated for this stage. However, I would like to see this hook
implemented in the future, since it could also help deduplicate other
multiversioning code.
Since this behaviour has existed for x86 and powerpc for a while, I
think it's best to preserve the existing behaviour on those targets,
unless any maintainer for those targets disagrees.
gcc/ChangeLog:
* multiple_target.cc
(redirect_to_specific_clone): Assert that "target" attribute is
used for FMV before checking it.
(ipa_target_clone): Skip redirect_to_specific_clone on some
targets.