Jerry DeLisle [Sat, 23 Nov 2024 03:29:42 +0000 (19:29 -0800)]
Fortran: Reject missing comma in format.
Standards require rejecting formats where descriptors
are not separated by commas. This change allows this
the missing comma to be accepted only with
-std=legacy.
PR fortran/88052
libgfortran/ChangeLog:
* io/format.c (parse_format_list): Reject missing comma in
format strings by default or if -std=f95 or higher. This is
a runtime error.
Expand coverage for `__builtin_memcpy', primarily for "cpymemM" block
copy pattern, although with smaller sizes open-coded sequences may be
produced instead.
This verifies block sizes in bytes from 1 to 64, across byte alignments
of 1, 2, 4, 8 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test cases) both for
the source and the destination, making sure all data is copied and no
data is changed outside the area meant to be written.
These choice of the ranges for the parameters has come from the Alpha
backend, whose "cpymemM" pattern covers copies being made of up to 64
bytes and has various corner cases related to base alignment and the
misalignment within.
The test cases have turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded these
tests qualify for generic verification and do not have to be limited to
the Alpha-specific subset of the testsuite.
On the implementation side the tests turned out being quite stressful to
GCC and the original simpler version that just expanded all code inline
took a lot of time to complete compilation. Depending on the target and
compilation options elapsed times up to 40 minutes (!) have been seen,
especially with GCC built at `-O0' for debugging purposes.
At the cost of increased complexity where a pair of macros is required
per variant rather than just one I have split the code into individual
functions forced not to be inlined and it improved compilation times
considerably without losing coverage.
Example compilation times with reasonably fast POWER9@2.166GHz at `-O2'
optimization and GCC built at `-O2' for various targets:
I have therefore set the timeout factor accordingly so as to take slower
test hosts into account.
gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c: New file.
* gcc.c-torture/execute/memcpy-a2.c: New file.
* gcc.c-torture/execute/memcpy-a4.c: New file.
* gcc.c-torture/execute/memcpy-a8.c: New file.
* gcc.c-torture/execute/memcpy-ax.h: New file.
build: Discard obsolete references to $(GCC_PARTS)
The $(GCC_PARTS) variable was deleted with the Makefile rework in commit fa9585134f6f ("libgcc move to the top level")[1] back in 2007, and yet
the Ada and Modula 2 frontends added references to this variable later
on, with commit e972fd5281b7 ("[Ada] clean ups in Makefiles")[2] back in
2011 and commit 1eee94d35177 ("Merge modula-2 front end onto gcc.") back
in 2022 respectively.
I guess it's because the frontends lived too long externally. Discard
the references then, they serve no purpose nowadays.
Georg-Johann Lay [Sat, 23 Nov 2024 11:51:32 +0000 (12:51 +0100)]
AVR: target/117744 - Fix asm for partial clobber of address reg,
gcc/
PR target/117744
* config/avr/avr.cc (out_movqi_r_mr): Fix code when a load
only partially clobbers an address register due to
changing the address register temporally to accomodate for
faked addressing modes.
Andrew Pinski [Thu, 31 Oct 2024 23:00:18 +0000 (16:00 -0700)]
md-files: Add a note about escaped quotes in braced strings in md files
While looking into PR 33532, It was noted that \" would be treated
still as " for braced strings in the md file. I think that is still
the correct thing to do. So let's just a note to the documentation
on this behavior and NOT change read-md.cc (read_braced_string).
Since this behavior has been there for the last 23 years and only
one person ran into this behavior and helped with the conversion
from using quoted strings to braced strings; that is you just need
to remove the quote around the brace rather than change all of the
code.
Build the documentation to make sure it looks correct.
gcc/ChangeLog:
* doc/rtl.texi: Add a note about quotes in braced strings.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
There's not much notable here, just gprofng (which is in binutils) being
disabled for musl and a new target which got added on that side too.
The only part which may look interesting is the baseargs->bbaseargs
change which goes back to Arsen's gettext work and a fixup which
landed for that on the binutils side in 9c0aa4c53104b1c4333d55aeaf11b41053307929.
* configure: Regenerate.
* configure.ac: Sync with Binutils.
Andrew Pinski [Fri, 22 Nov 2024 17:31:44 +0000 (09:31 -0800)]
build: Remove INCLUDE_MEMORY [PR117737]
Since diagnostic.h is included in over half of the sources, requiring to `#define INCLUDE_MEMORY`
does not make sense. Instead lets unconditionally include memory in system.h.
The majority of this patch is just removing `#define INCLUDE_MEMORY` from the sources which currently
have it.
This should also fix the mingw build issue but I have not tried it.
This is a small improvement to the constant synthesis code to capture a case
appended to PR 109279.
The case in question has the property that the high 32 bits have the value one
less than the low 32 bits and the highest bit in two low 32 bits is on. The
example used in BZ is 0xcccccccccccccccd which comes up computing N/10.
When we construct a constant with bit 31 on, it gets implicitly sign extended.
So something like 0xcccccccd when constructed would generate
0xffffffffcccccccd. The low bits are precisely what we want and the high bits
are a "-1". Both properties are useful.
We left shift that value by 32 positions into a temporary and add that
temporary to the original value. Concretely:
Tested in my tester on rv32 and rv64, waiting on the pre-commit tester to do its thing.
PR target/109279
gcc/
* config/riscv/riscv.cc (riscv_build_integer): Handle another 64-bit
synthesis where high half is one less than the low half and the 32-bit
sign bit is on.
Georg-Johann Lay [Thu, 21 Nov 2024 21:59:14 +0000 (22:59 +0100)]
AVR: target/117726 - Tweak ashiftrt:SI and lshiftrt:SI insns.
This patch is similar to r15-5569 (tweak ashift:SI) but for
ashiftrt and lshiftrt codes. It splits constant shift offsets > 16
into a 3-operand byte shift and a 2-operand residual bit shift.
Moreover, some of the constraint alternatives have been promoted
to 3-operand alternatives regardless of options. For example,
ashift:HI and lshiftrt:HI can support 3 operands for offsets 9...12
without any overhead.
Apart from that, it's a bit of code clean up for 2-byte and 4-byte
shift insns: Use one RTL peephole with any_shift code iterator
instead of 3 individual peepholes. It also removes some useless
split insns; presumably introduced during the cc0 -> CCmode work.
PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_split_shift): Also handle
ASHIFTRT and LSHIFTRT codes for 4-byte shifts.
(constr_split_shift4): New code_attr.
(avr_emit_shift): Adjust to new shift capabilities.
* config/avr/predicates.md (scratch_or_d_register_operand):
rename to scratch_or_dreg_operand.
* config/avr/avr.md: Same.
(define_peephole2): Write the RTL scratch peephole for 2-byte and
4-byte shifts that generates *sh*<mode>3_const insns using code
iterator any_shift.
(*ashlhi3_const_split, *ashrhi3_const_split, *ashrhi3_const_split)
(*lshrsi3_const_split, *lshrhi3_const_split): Remove useless
split insns.
(define_split) [avropt_split_bit_shift]: Add splitters
for 4-byte ASHIFTRT and LSHIFTRT insns using avr_split_shift().
(ashrsi3, *ashrsi3, *ashrsi3_const): Add "r,0,C4a" and "r,r,C4a"
constraint alternatives depending on 2op, 3op.
(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,0,C4r" and "r,r,C4r"
constraint alternatives depending on 2op, 3op. Add "r,r,C15".
(lshrhi3, *lshrhi3, *lshrhi3_const, ashlhi3, *ashlhi3)
(*ashlhi3_const): Add "r,r,C7c" alternative.
(ashrpsi, *ashrpsi3): Add "r,r,C22" alternative.
(ashlqi, *ashlqi): Turn C06 alternative into "r,r,C06".
* config/avr/constraints.md (C14, C22, C30, C7c): New constraints.
* config/avr/avr.cc (ashlhi3_out, lshrhi3_out)
[case 7, 9, 10, 11, 12]: Support as 3-operand insn.
(lshrsi3_out) [case 15]: Same.
(ashrsi3_out) [case 30]: Same.
(ashrhi3_out) [case 14]: Same.
(ashrqi3_out) [case 6]: Same.
(avr_out_ashrpsi3) [case 22]: Same.
* config/avr/avr.h: Fix comment typo.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Document.
Joseph Myers [Fri, 22 Nov 2024 20:33:10 +0000 (20:33 +0000)]
c: Fix typeof_unqual handling of qualified array types [PR112841]
As reported in bug 112841, typeof_unqual fails to remove qualifiers
from qualified array types. In C23 (unlike in previous standard
versions), array types are considered to have the qualifiers of the
element type, so typeof_unqual should remove such qualifiers (and an
example in the standard shows that is as intended). Fix this by
calling strip_array_types when checking for the presence of
qualifiers. (The reason we check for qualifiers rather than just
using TYPE_MAIN_VARIANT unconditionally is to avoid, as a quality of
implementation matter, unnecessarily losing typedef information in the
case where the type is already unqualified.)
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/112841
gcc/c/
* c-parser.cc (c_parser_typeof_specifier): Call strip_array_types
when checking for type qualifiers for typeof_unqual.
tree-optimization/117355: object size for PHI nodes with negative offsets
When the object size estimate is returned for a PHI node, it is the
maximum possible value, which is fine in isolation. When combined with
negative offsets however, it may sometimes end up in zero size because
the resultant size was larger than the wholesize, leading
size_for_offset to conclude that there's a potential underflow. Fix
this by allowing a non-strict mode to size_for_offset, which
conservatively returns the size (or wholesize) in case of a negative
offset.
gcc/ChangeLog:
PR tree-optimization/117355
* tree-object-size.cc (size_for_offset): New argument STRICT,
return SZ if it is set to false.
(plus_stmt_object_size): Adjust call to SIZE_FOR_OFFSET.
gcc/testsuite/ChangeLog:
PR tree-optimization/117355
* g++.dg/ext/builtin-object-size2.C (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c (test10): Adjust expected size.
Georg-Johann Lay [Thu, 21 Nov 2024 16:41:17 +0000 (17:41 +0100)]
AVR: Use Var(avropt_xxx) for option variables in avr.opt.
This is a no-op refactoring that uses a prefix of avropt_
(formerly: avr_) for variables defined qua Var() directives
in avr.opt. This makes it easier to spot values that come directly
from avr.opt in the rest of the backend.
The following patch adds a new option for optimizations related to
replaceable global operators new/delete.
The option isn't called -fassume-sane-operator-new (which clang++
implements), because
1) clang++ option means something different; initially it was an
option to add malloc attribute to those declarations (but we have
malloc attribute on all <new> calls already unconditionally);
later it was changed to add noalias attribute rather than malloc,
whatever it means, but it is certainly about the return value
from the operator new (whether it can alias with other pointers);
we already assume malloc-ish behavior that it doesn't alias any
other pointers
2) the option only affects operator new, we want it affect also
operator delete
The option basically allows to choose between pre-PR101480 behavior
(now the default, more optimistic) and post-PR101480 behavior (safer
but penalizing most of the code in the wild for rare needs).
I've tried to explain stuff in the documentation too.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR c++/110137
PR middle-end/101480
gcc/
* doc/invoke.texi (-fassume-sane-operators-new-delete,
-fno-assume-sane-operators-new-delete): Document.
* gimple.cc (gimple_call_fnspec): Handle
-f{,no-}assume-sane-operators-new-delete.
* ipa-inline-transform.cc (inline_call): Also clear
flag_assume_sane_operators_new_delete on caller when inlining
-fno-assume-sane-operators-new-delete callee into
-fassume-sane-operators-new-delete caller.
gcc/c-family/
* c.opt (fassume-sane-operators-new-delete): New option.
gcc/testsuite/
* g++.dg/tree-ssa/pr110137-1.C: New test.
* g++.dg/tree-ssa/pr110137-2.C: New test.
* g++.dg/tree-ssa/pr110137-3.C: New test.
* g++.dg/tree-ssa/pr110137-4.C: New test.
* g++.dg/torture/pr10148.C: Add -fno-assume-sane-operators-new-delete
as dg-additional-options.
* g++.dg/warn/Warray-bounds-16.C: Revert 2021-11-10 changes.
Jakub Jelinek [Fri, 22 Nov 2024 18:50:22 +0000 (19:50 +0100)]
match.pd: Fix up the new simpliofiers using with_possible_nonzero_bits2 [PR117420]
The following testcase shows wrong-code caused by incorrect use
of with_possible_nonzero_bits2.
That matcher is defined as
/* Slightly extended version, do not make it recursive to keep it cheap. */
(match (with_possible_nonzero_bits2 @0)
with_possible_nonzero_bits@0)
(match (with_possible_nonzero_bits2 @0)
(bit_and:c with_possible_nonzero_bits@0 @2))
and because with_possible_nonzero_bits includes the SSA_NAME case with
integral/pointer argument, both forms can actually match when a SSA_NAME
with integral/pointer type has a def stmt which is BIT_AND_EXPR
assignment with say SSA_NAME with integral/pointer type as one of its
operands (or INTEGER_CST, another with_possible_nonzero_bits case).
And in match.pd the latter actually wins if both match and so when using
(with_possible_nonzero_bits2 @0) the @0 will actually be one of the
BIT_AND_EXPR operands if that form is matched.
Now, with_possible_nonzero_bits2 and with_certain_nonzero_bits2 were added
for the
/* X == C (or X & Z == Y | C) is impossible if ~nonzero(X) & C != 0. */
(for cmp (eq ne)
(simplify
(cmp:c (with_possible_nonzero_bits2 @0) (with_certain_nonzero_bits2 @1))
(if (wi::bit_and_not (wi::to_wide (@1), get_nonzero_bits (@0)) != 0)
{ constant_boolean_node (cmp == NE_EXPR, type); })))
simplifier, but even for that one I think they do not do a good job, they
might actually pessimize stuff rather than optimize, but at least does not
result in wrong-code, because the operands are solely tested with
wi::to_wide or get_nonzero_bits, but not actually used in the
simplification. The reason why it can pessimize stuff is say if we have
# RANGE [irange] int ... MASK 0xb VALUE 0x0
x_1 = ...;
# RANGE [irange] int ... MASK 0x8 VALUE 0x0
_2 = x_1 & 0xc;
_3 = _2 == 2;
then if it used just with_possible_nonzero_bits@0, @0 would have
get_nonzero_bits (@0) 0x8 and (2 & ~8) != 0, so we can fold it into
_3 = 0;
But as it uses (with_possible_nonzero_bits2 @0), @0 is x_1 rather
than _2 and get_nonzero_bits (@0) is unnecessarily conservative,
0xb rather than 0x8 and (2 & ~0xb) == 0, so we don't optimize.
Now, with_possible_nonzero_bits2 can actually improve stuff as well in that
pattern, if say value ranges aren't fully computed yet or the BIT_AND_EXPR
assignment has been added later and the lhs doesn't have range computed yet,
get_nonzero_range on the BIT_AND_EXPR lhs will be all bits set, while
on the BIT_AND_EXPR operand might actually succeed.
I believe better would be to either modify get_nonzero_bits so that it
special cases the SSA_NAME with BIT_AND_EXPR def_stmt (but one level
deep only like with_possible_nonzero_bits2, no recursion), in that case
return bitwise and of get_nonzero_bits (non-recursive) for the lhs and
both operands, and possibly BIT_AND_EXPR itself e.g. for GENERIC
matching during by returning bitwise and of both operands.
Then with_possible_nonzero_bits2 could be needed for the GENERIC case,
perhaps have the second match #if GENERIC, but changed so that the @N
operand always is the whole thing rather than its operand which is
error-prone. Or add get_nonzero_bits wrapper with a different name
which would do that.
with_certain_nonzero_bits2 could be changed similarly, these days
we can test known non-zero bits rather than possible non-zero bits on
SSA_NAMEs too, we record both mask and value, so possible nonzero bits
(aka. get_nonzero_bits) is mask () | value (), while known nonzero bits
is value () & ~mask (), with a new function (get_known_nonzero_bits
or get_certain_nonzero_bits etc.) which handles that.
Anyway, the following patch doesn't do what I wrote above just yet,
for that single pattern it is just a missed optimization.
But the with_possible_nonzero_bits2 uses in the 3 new simplifiers are
just completely incorrect, because they don't just use the @0 operand
in get_nonzero_bits (pessimizing stuff if value ranges are fully computed),
but also use it in the replacement, then they act as if the BIT_AND_EXPR
wasn't there at all.
While we could use (with_possible_nonzero_bits2@3 @0) and use
get_nonzero_bits (@0) and use @3 in the replacement, that would still
often be a pessimization, so I've just used with_possible_nonzero_bits@0.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117420
* match.pd ((X >> C1) << (C1 + C2) -> X << C2,
(X >> C1) * (C2 << C1) -> X * C2, X / (1 << C) -> X /[ex] (1 << C)):
Use with_possible_nonzero_bits@0 rather than
(with_possible_nonzero_bits2 @0).
Jakub Jelinek [Fri, 22 Nov 2024 18:47:52 +0000 (19:47 +0100)]
c-family: Yet another fix for _BitInt & __sync_* builtins [PR117641]
Sorry, the last patch only partially fixed the __sync_* ICEs with
_BitInt(128) on ia32.
Even for !fetch we need to error out and return 0. I was afraid of
APIs like __atomic_exchange/__atomic_compare_exchange, those obviously
need to be supported even on _BitInt(128) on ia32, but they actually never
sync_resolve_size, they are handled by adding the size argument and using
the library version much earlier.
For fetch && !orig_format (i.e. __atomic_fetch_* etc.) we need to return -1
so that we handle it with a manualy __atomic_load +
__atomic_compare_exchange loop in the caller, all other cases should
be rejected.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR c/117641
* c-common.cc (sync_resolve_size): For size 16 with _BitInt
on targets where TImode isn't supported, use goto incompatible if
!fetch.
Andrew Pinski [Fri, 22 Nov 2024 00:55:01 +0000 (16:55 -0800)]
libsanitizer: Move language level from gnu++14 to gnu++17
While compiling libsanitizer for aarch64-linux-gnu, I noticed the new warning:
```
../../../../libsanitizer/asan/asan_interceptors.cpp: In function ‘char* ___interceptor_strcpy(char*, const char*)’:
../../../../libsanitizer/asan/asan_interceptors.cpp:554:6: warning: ‘if constexpr’ only available with ‘-std=c++17’ or ‘-std=gnu++17’ [-Wc++17-extensions]
554 | if constexpr (SANITIZER_APPLE) {
| ^~~~~~~~~
```
So compile-rt upstream compiles this as gnu++17 (the current defualt for clang), so let's update it
to be similar.
The DejaGnu routine "riscv_get_arch" fails to infer the correct
architecture string when GCC is built for RV32EC. This causes invalid
architecture string to be produced by "add_options_for_riscv_v":
xgcc: error: '-march=rv32cv': first ISA subset must be 'e', 'i' or 'g'
Fix by adding the E base ISA variant to the list of possible architecture
modifiers.
Also, the V extension is added to the machine string without checking
whether dependent extensions are available. This results in errors when
GCC is built for RV32EC:
Executing on host: .../xgcc ... -march=rv32ecv ...
cc1: error: ILP32E ABI does not support the 'D' extension
cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension
Fix by disabling vector tests for RISC-V if V extension cannot be added
to current architecture.
Tested riscv32-none-elf for -march=rv32ec using GNU simulator. Most of
the remaining failures are due to explicit addition of vector options,
yet missing "dg-require-effective-target riscv_v_ok":
=== gcc Summary ===
# of expected passes 211958
# of unexpected failures 1826
# of expected failures 1059
# of unresolved testcases 5209
# of unsupported tests 15513
Ensured riscv64-unknown-linux-gnu tested with qemu has no new passing or
failing tests, before and after applying this patch:
# of expected passes 237209
# of unexpected failures 335
# of expected failures 1670
# of unresolved testcases 43
# of unsupported tests 16767
PR target/117603
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (riscv_get_arch): Add comment about
function purpose. Add E ISA to list of possible
modifiers.
(check_vect_support_and_set_flags): Do not advertise vector
support if V extension cannot be enabled.
Add middle end support for the 'interop' directive and the 'init', 'use',
and 'destroy' clauses - but fail with a sorry, unimplemented in gimplify.cc.
For Fortran, generate the tree code, update the internal representation,
add some more diagnostic checks and update for newer specification changes
('fr' only takes a single value, but it integer expressions are permitted
again [like with the old syntax] not only constant identifiers).
For C and C++, this patch adds the full parser support for 'interop'.
Still missing is actually handling the directive in the middle end and
in libgomp.
The GOMP_INTEROP_IFR_* internal values have been changed to have space
for vendor specific values that are adjacent to the existing values
but negative, if needed.
gcc/c-family/ChangeLog:
* c-common.h (enum c_omp_region_type): Add C_ORT_INTEROP
and C_ORT_OMP_INTEROP.
(c_omp_interop_t_p): New prototype.
* c-omp.cc (c_omp_interop_t_p): Check whether the type is
omp_interop_t.
(c_omp_directives): Uncomment 'interop'.
* c-pragma.cc (omp_pragmas): Add 'interop'.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_INTEROP.
(enum pragma_omp_clause): Add init, use, and destroy clauses.
gcc/c/ChangeLog:
* c-parser.cc (INCLUDE_STRING): Define.
(c_parser_pragma): Handle 'interop' directive.
(c_parser_omp_clause_name): Handle init, use, and destroy clauses.
(c_parser_omp_all_clauses): Likewise; use C_ORT_OMP_INTEROP, if
'use' is permitted, for c_finish_omp_clauses.
(c_parser_omp_clause_destroy, c_parser_omp_modifier_prefer_type,
c_parser_omp_clause_init, c_parser_omp_clause_use,
OMP_INTEROP_CLAUSE_MASK, c_parser_omp_interop): New.
* c-typeck.cc (c_finish_omp_clauses): Add missing OPT_Wopenmp to
a warning; handle new clauses.
gcc/cp/ChangeLog:
* parser.cc (INCLUDE_STRING): Define.
(cp_parser_omp_clause_name): Handle init, use, and destroy clauses.
(cp_parser_omp_all_clauses): Likewise; use C_ORT_OMP_INTEROP, if
'use' is permitted, for c_finish_omp_clauses.
(cp_parser_omp_modifier_prefer_type, cp_parser_omp_clause_init,
OMP_INTEROP_CLAUSE_MASK, cp_parser_omp_interop): New.
(cp_parser_pragma): Handle 'interop' directive.
* pt.cc (tsubst_omp_clauses): Handle init, use, and destroy clauses.
(tsubst_stmt): Handle OMP_INTEROP.
* semantics.cc (cp_omp_init_prefer_type_update): New.
(finish_omp_clauses): Handle init, use, and destroy clauses
and add clause check for 'depend' on 'interop'.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_omp_namelist): Cleanup interop internal
representation.
* dump-parse-tree.cc (show_omp_namelist): Update for changed
internal representation.
* match.cc (gfc_free_omp_namelist): Likewise.
* openmp.cc (gfc_match_omp_prefer_type, gfc_match_omp_init):
Likewise; also handle some corner cases better and update for
newer 6.0 changes related to 'fr'.
(resolve_omp_clauses): Add type-check for interop variables.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle init, use
and destroy clauses.
(gfc_trans_openmp_interop): New.
(gfc_trans_omp_directive): Call it.
gcc/ChangeLog:
* gimplify.cc (gimplify_expr): Handle OMP_INTEROP by printing
"sorry, uninplemented".
* omp-api.h (omp_get_fr_id_from_name): Change return type to
'char'.
* omp-general.cc (omp_get_fr_id_from_name): Likewise; return
GOMP_INTEROP_IFR_UNKNOWN not 0 if not found.
(omp_get_name_from_fr_id): Return "<unknown>" not NULL
if not found (used for dumps).
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DESTROY,
OMP_CLAUSE_USE, and OMP_CLAUSE_INIT.
* tree-pretty-print.cc (dump_omp_init_prefer_type): New.
(dump_omp_clause): Handle init, use and destroy clauses.
(dump_generic_node): Handle interop directive.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add new
init/use/destroy clauses.
* tree.def (OACC_LOOP): Fix comment.
(OMP_INTEROP): Add.
* tree.h (OMP_INTEROP_CLAUSES, OMP_CLAUSE_INIT_TARGET,
OMP_CLAUSE_INIT_TARGETSYNC, OMP_CLAUSE_INIT_PREFER_TYPE): New.
include/ChangeLog:
* gomp-constants.h (GOMP_INTEROP_IFR_NONE): Rename ...
(GOMP_INTEROP_IFR_UNKNOWN): ... to this. And change value.
(GOMP_INTEROP_IFR_SEPARATOR): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/interop-1.f90: Update for parser changes,
spec changes and add new tests.
* gfortran.dg/gomp/interop-2.f90: Likewise.
* gfortran.dg/gomp/interop-3.f90: Likewise.
* c-c++-common/gomp/interop-1.c: New test.
* c-c++-common/gomp/interop-2.c: New test.
* c-c++-common/gomp/interop-3.c: New test.
* c-c++-common/gomp/interop-4.c: New test.
* g++.dg/gomp/interop-5.C: New test.
* gfortran.dg/gomp/interop-4.f90: New test.
Jakub Jelinek [Fri, 22 Nov 2024 10:33:34 +0000 (11:33 +0100)]
i386: Make __builtin_ia32_f{nstenv,ldenv,nstsw,fnclex} builtins internal [PR117165]
As the comment says, these builtins are meant to be internal for the atomic
support and cause various ICEs when using them directly in various
conditions.
So the following patch makes them internal.
We do have also internal-fn.*, but those target specific builtins would
need to be there in generic code, so I've just added space to their name,
which is the old way to hide builtins/attributes etc.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR target/117165
* config/i386/i386-builtin.def (IX86_BUILTIN_FNSTENV,
IX86_BUILTIN_FLDENV, IX86_BUILTIN_FNSTSW, IX86_BUILTIN_FNCLEX): Add
space to the end of the builtin name to make it really internal.
Jakub Jelinek [Fri, 22 Nov 2024 09:02:59 +0000 (10:02 +0100)]
testsuite: Fix up vector-{8,9,10}.c tests
On Thu, Nov 21, 2024 at 01:30:39PM +0100, Christoph Müllner wrote:
> > > * gcc.dg/tree-ssa/satd-hadamard.c: New test.
> > > * gcc.dg/tree-ssa/vector-10.c: New test.
> > > * gcc.dg/tree-ssa/vector-8.c: New test.
> > > * gcc.dg/tree-ssa/vector-9.c: New test.
I see FAILs on i686-linux or on x86_64-linux (in the latter
with -m32 testing).
One problem is that vector-10.c doesn't use -Wno-psabi option
and uses a function which returns a vector and takes vector
as first parameter, the other problems are that 3 other
tests don't arrange for at least basic vector ISA support,
plus non-standardly test only on x86_64-*-*, while normally
one would allow both i?86-*-* x86_64-*-* and if it is e.g.
specific to 64-bit, also check for lp64 or int128 or whatever
else is needed. E.g. Solaris I think has i?86-*-* triplet even
for 64-bit code, etc.
The following patch fixes these.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/tree-ssa/satd-hadamard.c: Add -msse2 as dg-additional-options
on x86. Also scan-tree-dump on i?86-*-*.
* gcc.dg/tree-ssa/vector-8.c: Likewise.
* gcc.dg/tree-ssa/vector-9.c: Likewise.
* gcc.dg/tree-ssa/vector-10.c: Add -Wno-psabi to dg-additional-options.
Tamar Christina [Thu, 21 Nov 2024 15:10:24 +0000 (15:10 +0000)]
middle-end:For multiplication try swapping operands when matching complex multiply [PR116463]
This commit fixes the failures of complex.exp=fast-math-complex-mls-*.c on the
GCC 14 branch and some of the ones on the master.
The current matching just looks for one order for multiplication and was relying
on canonicalization to always give the right order because of the TWO_OPERANDS.
However when it comes to the multiplication trying only one order is a bit
fragile as they can be flipped.
The failing tests on the branch are:
void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
_Complex TYPE c[restrict N]) {
for (int i = 0; i < N; i++)
c[i] -= a[i] * (b[i] * I * I);
}
void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
_Complex TYPE c[restrict N]) {
for (int i = 0; i < N; i++)
c[i] -= (a[i] * I * I) * b[i];
}
The issue is just a small difference in commutative operations.
we look for {R,R} * {R,I} but found {R,I} * {R,R}.
Since the DF analysis is cached, we should be able to swap operands and retry
for multiply cheaply.
There is a constraint being checked by vect_validate_multiplication for the data
flow of the operands feeding the multiplications. So e.g.
we require the lanes to come from the same source which
vect_validate_multiplication checks. As such it doesn't make sense to flip them
individually because that would invalidate the earlier linear_loads_p checks
which have validated that the arguments all come from the same datarefs.
This patch thus flips the operands in unison to still maintain this invariant,
but also honor the commutative nature of multiplication.
Xi Ruoyao [Thu, 31 Oct 2024 15:58:23 +0000 (23:58 +0800)]
LoongArch: Make __builtin_lsx_vorn_v and __builtin_lasx_xvorn_v arguments and return values unsigned
Align them with other vector bitwise builtins.
This may break programs directly invoking __builtin_lsx_vorn_v or
__builtin_lasx_xvorn_v, but doing so is not supported (as builtins are
not documented, only intrinsics are documented and users should use them
instead).
gcc/ChangeLog:
* config/loongarch/loongarch-builtins.cc (vorn_v, xvorn_v): Use
unsigned vector modes.
* config/loongarch/lsxintrin.h (__lsx_vorn_v): Cast arguments to
v16u8.
* config/loongarch/lasxintrin.h (__lasx_xvorn_v): Cast arguments
to v32u8.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lsx/lsx-builtin.c (__lsx_vorn_v):
Change arguments and return value to v16u8.
* gcc.target/loongarch/vector/lasx/lasx-builtin.c
(__lasx_xvorn_v): Change arguments and return value to v32u8.
Jeff Law [Thu, 21 Nov 2024 23:21:07 +0000 (16:21 -0700)]
[RISC-V][PR target/117690] Add missing shift in constant synthesis
As hinted out in the BZ, we were missing a left shift in the constant synthesis
in the case where the upper 32 bits can be synthesized using a shNadd of the
low 32 bits.
This adjusts the synthesis to add the missing left shift and adjusts the cost
to account for the additional instruction.
Regression tested on riscv64-elf in my tester. Waiting for the pre-commit
tester before moving forward.
PR target/117690
gcc/
* config/riscv/riscv.cc (riscv_build_integer): Add missing left
shift when using shNadd to derive upper 32 bits from lower 32 bits.
gcc/testsuite
* gcc.target/riscv/pr117690.c: New test.
* gcc.target/riscv/synthesis-13.c: Adjust expected output.
Arsen Arsenović [Fri, 18 Oct 2024 21:14:58 +0000 (23:14 +0200)]
doc/cpp: Document __has_include_next
While hacking on an unrelated change, I noticed that __has_include_next
hasn't been documented at all. This patch adds it to the __has_include
manual node.
gcc/ChangeLog:
* doc/cpp.texi (__has_include): Document __has_include_next
also.
(Conditional Syntax): Mention __has_include_next in the
description for the __has_include menu entry.
Joseph Myers [Thu, 21 Nov 2024 21:46:00 +0000 (21:46 +0000)]
c: Give errors more consistently for void parameters [PR114816]
Cases of void parameters, other than a parameter list of (void) (or
equivalent with a typedef for void) in its entirety, have been made a
constraint violation in C2Y (N3344 alternative 1 was adopted), as part
of a series of changes to eliminate unnecessary undefined behavior by
turning it into constraint violations, implementation-defined behavior
or something else with stricter bounds on what behavior is allowed.
Previously, these were implicitly undefined behavior (see DR#295),
with only some cases listed in Annex J as undefined (but even those
cases not having wording in the normative text to make them explicitly
undefined).
As discussed in bug 114816, GCC is not entirely consistent about
diagnosing such usages; unnamed void parameters get errors when not
the entire parameter list, while qualified and register void (the
cases listed in Annex J) get errors as a single unnamed parameter, but
named void parameters are accepted with a warning (in a declaration
that's not a definition; it's not possible to define a function with
incomplete parameter types).
Following C2Y, make all these cases into errors. The errors are not
conditional on the standard version, given that this was previously
implicit undefined behavior. Since it wasn't possible anyway to
define such functions, only declare them without defining them (or
otherwise use such parameters in function type names that can't
correspond to any defined function), hopefully the risks of
compatibility issues are small.
Bootstrapped with no regressions for x86-64-pc-linux-gnu.
PR c/114816
gcc/c/
* c-decl.cc (grokparms): Do not warn for void parameter type here.
(get_parm_info): Give errors for void parameters even when named.
David Malcolm [Thu, 21 Nov 2024 19:36:16 +0000 (14:36 -0500)]
testsuite: add print-stack.exp
I wrote this support file to help me debug Tcl issues in the
testsuite.
Adding a call to:
print_stack_backtrace
somewhere in a .exp file (along with "load_lib print-stack.exp") leads
to the interpreter printing a backtrace in a form that e.g. Emacs can
consume, with filename:linenum: lines, and quoting the line of .exp
source code.
Fer example, adding a print_stack_backtrace to scansarif.exp in
run-sarif-pytest I get this output:
VVV START OF BACKTRACE VVV
/home/david/coding/gcc-newgit/src/gcc/testsuite/lib/scansarif.exp:142: frame 16 in proc print_stack_backtrace
142 | print_stack_backtrace
<proc>: frame 15 in proc run-sarif-pytest
<eval>: frame 14 in proc dg-final-proc
/usr/share/dejagnu/dg.exp:851: frame 13 in proc dg-final-proc
851 | if {[catch "dg-final-proc $prog" errmsg]} {
<eval>: frame 12 in proc saved-dg-test
/home/david/coding/gcc-newgit/src/gcc/testsuite/lib/gcc-dg.exp:1080: frame 11 in proc saved-dg-test
1080 | if { [ catch { eval saved-dg-test $args } errmsg ] } {
/usr/share/dejagnu/dg.exp:559: frame 10 in proc dg-test
559 | dg-test $testcase $options ${default-extra-options}
/home/david/coding/gcc-newgit/src/gcc/testsuite/gcc.dg/sarif-output/sarif-output.exp:28: frame 9
28 | dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
<eval>: frame 8
<eval>: frame 7
/usr/share/dejagnu/runtest.exp:1460: frame 6
1460 | if { [catch "uplevel #0 source $test_file_name"] == 1 } {
/usr/share/dejagnu/runtest.exp:1886: frame 5 in proc dg-runtest
1886 | runtest $test_name
/usr/share/dejagnu/runtest.exp:1845: frame 4 in proc dg-runtest
1845 | foreach test_name [lsort [find ${dir} *.exp]] {
/usr/share/dejagnu/runtest.exp:1788: frame 3 in proc dg-runtest
1788 | foreach dir "${test_top_dirs}" {
/usr/share/dejagnu/runtest.exp:1669: frame 2 in proc dg-runtest
1669 | foreach pass $multipass {
/usr/share/dejagnu/runtest.exp:1619: frame 1 in proc dg-runtest
1619 | foreach current_target $target_list {
^^^ END OF BACKTRACE ^^^
and can click on the lines in Emacs's compilation buffer to take
me to the relevant places.
I found this made it *much* easier to debug my .exp files. That
said, I'm uncomfortable with Tcl, and so
(a) there may be a better way of doing this
(b) I may have made mistakes
gcc/testsuite/ChangeLog:
* lib/print-stack.exp: New file.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
testsuite: tree-ssa: Limit targets for vec perm tests
Recently added test cases assume optimized code generation for certain
vectorized code. However, this optimization might not be applied if
the backends don't support the optimized permuation.
The tests are confirmed to work on aarch64 and x86-64, so this
patch restricts the tests accordingly.
Tested on x86-64.
PR117728
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/satd-hadamard.c: Restrict to aarch64 and x86-64.
* gcc.dg/tree-ssa/vector-8.c: Likewise.
* gcc.dg/tree-ssa/vector-9.c: Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Jason Merrill [Wed, 20 Nov 2024 09:43:30 +0000 (10:43 +0100)]
c++: modules and debug marker stmts
21_strings/basic_string/operations/contains/nonnull.cc was failing because
the module was built with debug markers and the testcase was built not
expecting debug markers, so we crashed in lower_stmt. Let's accommodate
this by discarding debug marker statements we don't want.
gcc/cp/ChangeLog:
* module.cc (trees_in::core_vals) [STATEMENT_LIST]: Skip
DEBUG_BEGIN_STMT if !MAY_HAVE_DEBUG_MARKER_STMTS.
Jason Merrill [Wed, 20 Nov 2024 12:51:10 +0000 (13:51 +0100)]
c++: modules and tsubst_friend_class
In 20_util/function_objects/mem_fn/constexpr.cc we start to instantiate
_Mem_fn_base's friend declaration of _Bind_check_arity before we've loaded
the namespace-scope declaration, so lookup_imported_hidden_friend doesn't
find it. But then we load the namespace-scope declaration in
lookup_template_class during substitution, and so when we get around to
pushing the result of substitution, they conflict. Fixed by calling
lazy_load_pendings in lookup_imported_hidden_friend.
Georg-Johann Lay [Wed, 20 Nov 2024 11:25:18 +0000 (12:25 +0100)]
AVR: target/117726 - Better optimizations of ASHIFT:SI insns.
This patch improves the 4-byte ASHIFT insns.
1) It adds a "r,r,C15" alternative for improved long << 15.
2) It adds 3-operand alternatives (depending on options) and
splits them after peephole2 / before avr-fuse-move into
a 3-operand byte shift and a 2-operand residual bit shift.
For better control, it introduces new option -msplit-bit-shift
that's activated at -O2 and higher per default. 2) is even
performed with -Os, but not with -Oz.
PR target/117726
gcc/
* config/avr/avr.opt (-msplit-bit-shift): Add new optimization option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_2_PLUS]: Turn on -msplit-bit-shift.
* config/avr/avr.h (machine_function.n_avr_fuse_add_executed):
New bool component.
* config/avr/avr.md (attr "isa") <2op, 3op>: Add new values.
(attr "enabled"): Handle them.
(ashlsi3, *ashlsi3, *ashlsi3_const): Add "r,r,C15" alternative.
Add "r,0,C4l" and "r,r,C4l" alternatives (depending on 2op / 3op).
(define_split) [avr_split_bit_shift]: Add 2 new ashift:ALL4 splitters.
(define_peephole2) [ashift:ALL4]: Add (match_dup 3) so that the scratch
won't overlap with the output operand of the matched insn.
(*ashl<mode>3_const_split): Remove unused ashift:ALL4 splitter.
* config/avr/avr-passes.cc (emit_valid_insn)
(emit_valid_move_clobbercc): Move out of anonymous namespace.
(make_avr_pass_fuse_add) <gate>: Don't override.
<execute>: Set n_avr_fuse_add_executed according to
func->machine->n_avr_fuse_add_executed.
(pass_data avr_pass_data_split_after_peephole2): New object.
(avr_pass_split_after_peephole2): New rtl_opt_pass.
(avr_emit_shift): New static function.
(avr_shift_is_3op, avr_split_shift_p, avr_split_shift)
(make_avr_pass_split_after_peephole2): New functions.
* config/avr/avr-passes.def (avr_pass_split_after_peephole2):
Insert new pass after pass_peephole2.
* config/avr/avr-protos.h
(n_avr_fuse_add_executed, avr_shift_is_3op, avr_split_shift_p)
(avr_split_shift, avr_optimize_size_level)
(make_avr_pass_split_after_peephole2): New prototypes.
* config/avr/avr.cc (n_avr_fuse_add_executed): New global variable.
(avr_optimize_size_level): New function.
(avr_set_current_function): Set n_avr_fuse_add_executed
according to cfun->machine->n_avr_fuse_add_executed.
(ashlsi3_out) [case 15]: Output optimized code for this offset.
(avr_rtx_costs_1) [ASHIFT, SImode]: Adjust costs of oggsets 15, 16.
* config/avr/constraints.md (C4a, C4r, C4r): New constraints.
* pass_manager.h (pass_manager): Adjust comments.
Jeff Law [Thu, 21 Nov 2024 15:24:10 +0000 (08:24 -0700)]
[RISC-V][PR target/116590] Avoid emitting multiple instructions from fmacc patterns
So much like my patch from last week, this removes alternatives that
create multiple instructions that we really should have never needed.
In this case it fixes one of two bugs in pr116590. In particular we
don't want vmvNr instructions for thead-vector. Those instructions were
emitted as part of those two instruction sequences.
I've tested this in my tester and assuming the pre-commit tester is
happy, I'll push it to the trunk.
Pan Li [Mon, 11 Nov 2024 08:44:24 +0000 (16:44 +0800)]
Match: Refactor the unsigned SAT_ADD match pattern [NFC]
This patch would like to refactor the unsigned SAT_ADD pattern by:
* Extract type check outside.
* Extract common sub pattern.
* Re-arrange the related match pattern forms together.
* Remove unnecessary helper pattern matches.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Refactor sorts of unsigned SAT_ADD match pattern.
Signed-off-by: Pan Li <pan2.li@intel.com> Signed-off-by: Pan Li <pan2.li@intel.com>
Tamar Christina [Thu, 21 Nov 2024 12:49:35 +0000 (12:49 +0000)]
middle-end: Pass along SLP node when costing vector loads/stores
With the support to SLP only we now pass the VMAT through the SLP node, however
the majority of the costing calls inside vectorizable_load and
vectorizable_store do no pass the SLP node along. Due to this the backend costing
never sees the VMAT for these cases anymore.
Additionally the helper around record_stmt_cost when both SLP and stmt_vinfo are
passed would only pass the SLP node along. However the SLP node doesn't contain
all the info available in the stmt_vinfo and we'd have to go through the
SLP_TREE_REPRESENTATIVE anyway. As such I changed the function to just Always
pass both along. Unlike the VMAT changes, I don't believe there to be a
correctness issue here but would minimize the number of churn in the backend
costing until vectorizer costing as a whole is revisited in GCC 16.
These changes re-enable the cost model on AArch64 and also correctly find the
VMATs on loads and stores fixing testcases such as sve_iters_low_2.c.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_get_data_access_cost): Pass NULL for SLP
node.
* tree-vect-stmts.cc (record_stmt_cost): Expose.
(vect_get_store_cost, vect_get_load_cost): Extend with SLP node.
(vectorizable_store, vectorizable_load): Pass SLP node to all costing.
* tree-vectorizer.h (record_stmt_cost): Always pass both SLP node and
stmt_vinfo to costing.
(vect_get_load_cost, vect_get_store_cost): Extend with SLP node.
elfos.h (ASM_DECLARE_OBJECT_NAME): Use decl size instead of type size.
was applied, those were missed. At the same time, the testcase was
restricted to Linux though there's nothing Linux-specific in there, so
the error remained undetected.
This patch fixes the definitions to match elfos.h and enables the test
on Solaris, too.
Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.
forwprop: Try to blend two isomorphic VEC_PERM sequences
This extends forwprop by yet another VEC_PERM optimization:
It attempts to blend two isomorphic vector sequences by using the
redundancy in the lane utilization in these sequences.
This redundancy in lane utilization comes from the way how specific
scalar statements end up vectorized: two VEC_PERMs on top, binary operations
on both of them, and a final VEC_PERM to create the result.
Here is an example of this sequence:
To remove the redundancy, lanes 2 and 3 can be freed, which allows to
change the last statement into:
v_out' = VEC_PERM <v_x, v_y, {0, 1, 4, 5}>
// v_out' = {e0+e1, e2+e3, e0-e1, e2-e3}
The cost of eliminating the redundancy in the lane utilization is that
lowering the VEC PERM expression could get more expensive because of
tighter packing of the lanes. Therefore this optimization is not done
alone, but in only in case we identify two such sequences that can be
blended.
Once all candidate sequences have been identified, we try to blend them,
so that we can use the freed lanes for the second sequence.
On success we convert 2x (2x BINOP + 1x VEC_PERM) to
2x VEC_PERM + 2x BINOP + 2x VEC_PERM traded for 4x VEC_PERM + 2x BINOP.
The implemented transformation reuses (rewrites) the statements
of the first sequence and the last VEC_PERM of the second sequence.
The remaining four statements of the second statment are left untouched
and will be eliminated by DCE later.
This targets x264_pixel_satd_8x4, which calculates the sum of absolute
transformed differences (SATD) using Hadamard transformation.
We have seen 8% speedup on SPEC's x264 on a 5950X (x86-64) and 7%
speedup on an AArch64 machine.
Bootstrapped and reg-tested on x86-64 and AArch64 (all languages).
gcc/ChangeLog:
* tree-ssa-forwprop.cc (struct _vec_perm_simplify_seq): New data
structure to store analysis results of a vec perm simplify sequence.
(get_vect_selector_index_map): Helper to get an index map from the
provided vector permute selector.
(recognise_vec_perm_simplify_seq): Helper to recognise a
vec perm simplify sequence.
(narrow_vec_perm_simplify_seq): Helper to pack the lanes more
tight.
(can_blend_vec_perm_simplify_seqs_p): Test if two vec perm
sequences can be blended.
(calc_perm_vec_perm_simplify_seqs): Helper to calculate the new
permutation indices.
(blend_vec_perm_simplify_seqs): Helper to blend two vec perm
simplify sequences.
(process_vec_perm_simplify_seq_list): Helper to process a list
of vec perm simplify sequences.
(append_vec_perm_simplify_seq_list): Helper to add a vec perm
simplify sequence to the list.
(pass_forwprop::execute): Integrate new functionality.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/satd-hadamard.c: New test.
* gcc.dg/tree-ssa/vector-10.c: New test.
* gcc.dg/tree-ssa/vector-8.c: New test.
* gcc.dg/tree-ssa/vector-9.c: New test.
* gcc.target/aarch64/sve/satd-hadamard.c: New test.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
H.J. Lu [Thu, 21 Nov 2024 11:08:03 +0000 (19:08 +0800)]
apx-ndd-tls-1[ab].c: Add -std=gnu17
Since GCC 15 defaults to -std=gnu23, add -std=gnu17 to apx-ndd-tls-1[ab].c
to avoid:
gcc.target/i386/apx-ndd-tls-1a.c: In function ‘k’:
gcc.target/i386/apx-ndd-tls-1a.c:29:7: error: too many arguments to function ‘l’
gcc.target/i386/apx-ndd-tls-1a.c:25:5: note: declared here
Rainer Orth [Thu, 21 Nov 2024 10:46:36 +0000 (11:46 +0100)]
libgomp: testsuite: Fix libgomp.c/alloc-pinned-3.c etc. for C23 on non-Linux
Since the switch to a C23 default, three libgomp tests FAIL on Solaris:
FAIL: libgomp.c/alloc-pinned-3.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-3.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-4.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-4.c compilation failed to produce executable
FAIL: libgomp.c/alloc-pinned-6.c (test for excess errors)
UNRESOLVED: libgomp.c/alloc-pinned-6.c compilation failed to produce executable
Excess errors:
/vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.c/alloc-pinned-3.c:104:3: error: too many arguments to function 'set_pin_limit'
Fixed by adding the missing size argument to the stub functions.
Tested on i386-pc-solaris2.11 and sparc-sun-solaris2.11.
Jakub Jelinek [Thu, 21 Nov 2024 09:17:03 +0000 (10:17 +0100)]
include: Add new post-DWARF 5 DW_LANG_* enumerators
DWARF changed the language code assignment to be on a web page and
after DWARF 5 has been published already 27 codes have been assigned.
We have some of those already in the header, but most of them were missing,
including one added just yesterday (DW_LANG_C23).
Note, this is really post-DWARF 5 stuff rather than DWARF 6, because
DWARF 6 plans to switch from DW_AT_language to DW_AT_language_{name,version}
pair where we'll say DW_LNAME_C with 202311 version instead of this.
2024-11-21 Jakub Jelinek <jakub@redhat.com>
* dwarf2.h (enum dwarf_source_language): Add comment where
the post DWARF 5 additions start. Refresh list from
https://dwarfstd.org/languages.html.
Richard Biener [Thu, 21 Nov 2024 08:14:53 +0000 (09:14 +0100)]
tree-optimization/117720 - check alignment for VMAT_STRIDED_SLP
While vectorizable_store was already checking alignment requirement
of the stores and fall back to elementwise accesses if not honored
the vectorizable_load path wasn't doing this. After the previous
change to disregard alignment checking for VMAT_STRIDED_SLP in
get_group_load_store_type this now tripped on power.
PR tree-optimization/117720
* tree-vect-stmts.cc (vectorizable_load): For VMAT_STRIDED_SLP
verify the choosen load type is OK with regard to alignment.
Jakub Jelinek [Thu, 21 Nov 2024 08:40:37 +0000 (09:40 +0100)]
c-family, docs: Adjust descriptions/documentation for C23 publication
As C23 has been published already https://www.iso.org/standard/82075.html
we don't need to say that it is expected to be published etc.
Furthermore, standards.texi was still documenting that -std=gnu17
is the default.
2024-11-21 Jakub Jelinek <jakub@redhat.com>
gcc/
* doc/invoke.texi (-std=c23): Adjust documentation for
publication of the ISO/IEC 9899:2024 standard.
* doc/standards.texi: Likewise. Document -std=gnu17 and
-std=gnu23 options. Mention that -std=gnu23 rather than
-std=gnu17 is now the default for C.
gcc/c-family/
* c.opt (std=c23, std=gnu23, std=iso9899:2024): Adjust description
for publication of the ISO/IEC 9899:2024 standard.
Jakub Jelinek [Thu, 21 Nov 2024 08:39:06 +0000 (09:39 +0100)]
phiopt: Improve spaceship_replacement for HONOR_NANS [PR117612]
The following patch optimizes spaceship followed by comparisons of the
spaceship value even for floating point spaceship when NaNs can appear.
operator<=> for this emits roughly
signed char c; if (i == j) c = 0; else if (i < j) c = -1; else if (i > j) c = 1; else c = 2;
and I believe the
/* The optimization may be unsafe due to NaNs. */
comment just isn't true.
Sure, the i == j comparison doesn't raise exceptions on qNaNs, but if
one of the operands is qNaN, then i == j is false and i < j or i > j
is then executed and raises exceptions even on qNaNs.
And we can safely optimize say
c == -1 comparison after the above into i < j, that also raises
exceptions like before and handles NaNs the same way as the original.
The only unsafe transormation would be c == 0 or c != 0, turning it
into i == j or i != j wouldn't raise exception, so I'm not doing that
optimization (but other parts of the compiler optimize the i < j comparison
away anyway).
Anyway, to match the HONOR_NANS case, we need to verify that the
second comparison has true edge to the phi_bb (yielding there -1 or 1),
it can't be the false edge because when NaNs are honored, the false
edge is for both the case where the inverted comparison is true or when
one of the operands is NaN. Similarly we need to ensure that the two
non-equality comparisons are the opposite, while for -ffast-math we can in
some cases get one comparison x >= 5.0 and the other x > 5.0 and it is fine,
because NaN is UB, when NaNs are honored, they must be different to leave
the unordered case with 2 value as the last one remaining.
The patch also punts if HONOR_NANS and the phi has just 3 arguments instead
of 4.
When NaNs are honored, we also in some cases need to perform some comparison
and then invert its result (so that exceptions are properly thrown and we
get the correct result).
2024-11-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Handle
HONOR_NANS (TREE_TYPE (lhs1)) case when possible.
* gcc.dg/pr94589-5.c: New test.
* gcc.dg/pr94589-6.c: New test.
* g++.dg/opt/pr94589-5.C: New test.
* g++.dg/opt/pr94589-6.C: New test.
Jakub Jelinek [Thu, 21 Nov 2024 08:38:01 +0000 (09:38 +0100)]
phiopt: Fix a pasto in spaceship_replacement [PR117612]
When working on the PR117612 fix, I've noticed a pasto in
tree-ssa-phiopt.cc (spaceship_replacement).
The code is
if (absu_hwi (tree_to_shwi (arg2)) != 1)
return false;
if (e1->flags & EDGE_TRUE_VALUE)
{
if (tree_to_shwi (arg0) != 2
|| absu_hwi (tree_to_shwi (arg1)) != 1
|| wi::to_widest (arg1) == wi::to_widest (arg2))
return false;
}
else if (tree_to_shwi (arg1) != 2
|| absu_hwi (tree_to_shwi (arg0)) != 1
|| wi::to_widest (arg0) == wi::to_widest (arg1))
return false;
where arg{0,1,2,3} are PHI args and wants to ensure that if e1 is a
true edge, then arg0 is 2 and one of arg{1,2} is -1 and one is 1,
otherwise arg1 is 2 and one of arg{0,2} is -1 and one is 1.
But due to pasto in the latte case doesn't verify that arg0
is different from arg2, it could be both -1 or both 1 and we wouldn't
punt. The wi::to_widest (arg0) == wi::to_widest (arg1) test
is always false when we've made sure in the earlier conditions that
arg1 is 2 and arg0 is -1 or 1, so never 2.
2024-11-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/94589
PR tree-optimization/117612
* tree-ssa-phiopt.cc (spaceship_replacement): Fix up
a pasto in check when arg1 is 2.
Kewen Lin [Thu, 21 Nov 2024 07:41:34 +0000 (07:41 +0000)]
rs6000: Adjust FLOAT128 signbit2 expander for P8 LE [PR114567]
As the associated test case shows, signbit generated assembly
is sub-optimal for _Float128 argument from memory on P8 LE.
On P8 LE, p8swap pass puts an explicit AND -16 on the memory,
which causes mode_dependent_address_p considers it's invalid
to change its mode and combine fails to make use of the
existing pattern signbit<SIGNBIT:mode>2_dm_mem. Considering
it's always more efficient to make use of 8 bytes load and
shift on P8 LE, this patch is to adjust the current expander
and treat it specially.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Use standard name {add,sub}v1ti3 for altivec_v{add,sub}uqm
This patch is to adjust define_insn altivec_v{add,sub}uqm
with standard names, as the associated test case shows, w/o
this patch, it ends up with scalar {add,subf}c/{add,subf}e,
the standard names help to exploit v{add,sub}uqm.
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vadduqm): Rename to ...
(addv1ti3): ... this.
(altivec_vsubuqm): Rename to ...
(subv1ti3): ... this.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vadduqm):
Replace bif expander altivec_vadduqm with addv1ti3.
(__builtin_altivec_vsubuqm): Replace bif expander altivec_vsubuqm with
subv1ti3.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p8vector-int128-3.c: New test.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Remove entry for V1TImode from VI_unit
When making a patch to adjust VECTOR_P8_VECTOR rs6000_vector
enum, I noticed that V1TImode's mode attribute in VI_unit
VECTOR_UNIT_ALTIVEC_P (V1TImode) is never true, since
VECTOR_UNIT_ALTIVEC_P checks if vector_unit[V1TImode] is
equal to VECTOR_ALTIVEC, but vector_unit[V1TImode] can only
be VECTOR_NONE or VECTOR_P8_VECTOR, there is no chance to be
VECTOR_ALTIVEC:
rs6000_vector_unit[V1TImode]
= (TARGET_P8_VECTOR) ? VECTOR_P8_VECTOR : VECTOR_NONE;
By checking all uses of VI_unit, the used mode iterator is
one of VI2, VI, VP_small and VP, none of them has V1TImode,
so the entry for V1TImode is useless. I guessed it was
designed to have one mode attribute to cover all integer
vector modes, but later we separated V1TI handlings to its
own patterns (those guarded with TARGET_VADDUQM). Anyway,
this patch is to remove this useless and confusing entry.
gcc/ChangeLog:
* config/rs6000/altivec.md (mode attr for V1TI in VI_unit): Remove.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Add veqv support to *eqv<mode>3_internal1
When making patch to replace TARGET_P8_VECTOR, I noticed
for *eqv<BOOL_128:mode>3_internal1 unlike the other logical
operations, we only exploited the vsx version. I think it
is an oversight, this patch is to consider veqv as well.
gcc/ChangeLog:
* config/rs6000/rs6000.md (*eqv<BOOL_128:mode>3_internal1): Generate
insn veqv if TARGET_ALTIVEC and operands are altivec_register_operand.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Remove ISA_3_0_MASKS_IEEE and check P9_VECTOR instead
When working to get rid of mask bit OPTION_MASK_P8_VECTOR,
I noticed that the check on ISA_3_0_MASKS_IEEE is actually
to check TARGET_P9_VECTOR, since we check all three mask
bits together and p9 vector guarantees p8 vector and vsx
should be enabled. So this patch is to adjust this first
as preparatory patch for the following patch to change
all uses of OPTION_MASK_P8_VECTOR and TARGET_P8_VECTOR.
Kewen Lin [Thu, 21 Nov 2024 07:41:33 +0000 (07:41 +0000)]
rs6000: Simplify some conditions or code related to TARGET_DIRECT_MOVE
When I was making a patch to rework TARGET_P8_VECTOR, I
noticed that there are some redundant checks and dead code
related to TARGET_DIRECT_MOVE, so I made this patch as one
separated preparatory patch, it consists of:
- Check either TARGET_DIRECT_MOVE or TARGET_P8_VECTOR only
according to the context, rather than checking both of
them since they are actually the same (TARGET_DIRECT_MOVE
is defined as TARGET_P8_VECTOR).
- Simplify TARGET_VSX && TARGET_DIRECT_MOVE as
TARGET_DIRECT_MOVE since direct move ensures VSX enabled.
- Replace some TARGET_POWERPC64 && TARGET_DIRECT_MOVE as
TARGET_DIRECT_MOVE_64BIT to simplify it.
- Remove some dead code guarded with TARGET_DIRECT_MOVE
but the condition never holds here.
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Simplify
TARGET_P8_VECTOR && TARGET_DIRECT_MOVE as TARGET_P8_VECTOR.
(rs6000_output_move_128bit): Simplify TARGET_VSX && TARGET_DIRECT_MOVE
as TARGET_DIRECT_MOVE.
* config/rs6000/rs6000.h (TARGET_XSCVDPSPN): Simplify conditions
TARGET_DIRECT_MOVE || TARGET_P8_VECTOR as TARGET_P8_VECTOR.
(TARGET_XSCVSPDPN): Likewise.
(TARGET_DIRECT_MOVE_128): Simplify TARGET_DIRECT_MOVE &&
TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
(TARGET_VEXTRACTUB): Likewise.
(TARGET_DIRECT_MOVE_64BIT): Simplify TARGET_P8_VECTOR &&
TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE.
* config/rs6000/rs6000.md (signbit<mode>2, @signbit<mode>2_dm,
*signbit<mode>2_dm_mem, floatsi<mode>2_lfiwax,
floatsi<SFDF:mode>2_lfiwax_<QHI:mode>_mem_zext,
floatunssi<mode>2_lfiwzx, float<QHI:mode><SFDF:mode>2,
*float<QHI:mode><SFDF:mode>2_internal, floatuns<QHI:mode><SFDF:mode>2,
*floatuns<QHI:mode><SFDF:mode>2_internal, p8_mtvsrd_v16qidi2,
p8_mtvsrd_df, p8_xxpermdi_<mode>, reload_vsx_from_gpr<mode>,
p8_mtvsrd_sf, reload_vsx_from_gprsf, p8_mfvsrd_3_<mode>,
reload_gpr_from_vsx<mode>, reload_gpr_from_vsxsf, unpack<mode>_dm):
Simplify TARGET_DIRECT_MOVE && TARGET_POWERPC64 as
TARGET_DIRECT_MOVE_64BIT.
(unpack<mode>_nodm): Simplify !TARGET_DIRECT_MOVE || !TARGET_POWERPC64
as !TARGET_DIRECT_MOVE_64BIT.
(fix_trunc<mode>si2, fix_trunc<mode>si2_stfiwx,
fix_trunc<mode>si2_internal): Simplify TARGET_P8_VECTOR &&
TARGET_DIRECT_MOVE as TARGET_DIRECT_MOVE.
(fix_trunc<mode>si2_stfiwx, fixuns_trunc<mode>si2_stfiwx): Remove some
dead code as the guard TARGET_DIRECT_MOVE there never holds.
(fixuns_trunc<mode>si2_stfiwx): Change TARGET_P8_VECTOR with
TARGET_DIRECT_MOVE which is a better fit.
* config/rs6000/vsx.md (define_peephole2 for SFmode in GPR): Simplify
TARGET_DIRECT_MOVE && TARGET_POWERPC64 as TARGET_DIRECT_MOVE_64BIT.
Lewis Hyatt [Fri, 25 Oct 2024 18:55:09 +0000 (14:55 -0400)]
tree-cfg: Fix call to next_discriminator_for_locus()
While testing future 64-bit location_t support, I ran into an
-fcompare-debug issue that was traced back here. Despite the name,
next_discriminator_for_locus() is meant to take an integer line number
argument, not a location_t. There is one call site which has been passing a
location_t instead. For the most part that is harmless, although in case
there are two CALL stmts on the same line with different location_t, it may
fail to generate a unique discriminator where it should. If/when location_t
changes to be 64-bit, however, it will produce an -fcompare-debug
failure. Fix it by passing the line number rather than the location_t.
I am not aware of a testcase that demonstrates any observable wrong
behavior, but the file debug/pr53466.C is an example where the discriminator
assignment is indeed different before and after this change.
gcc/ChangeLog:
* tree-cfg.cc (assign_discriminators): Fix incorrect value passed to
next_discriminator_for_locus().
Harald Anlauf [Wed, 20 Nov 2024 20:59:22 +0000 (21:59 +0100)]
Fortran: fix checking of protected variables in submodules [PR83135]
When a symbol was use-associated in the ancestor of a submodule, a
PROTECTED attribute was ignored in the submodule or its descendants.
Find the real ancestor of symbols when used in a variable definition
context in a submodule.
PR fortran/83135
gcc/fortran/ChangeLog:
* expr.cc (sym_is_from_ancestor): New helper function.
(gfc_check_vardef_context): Refine checking of PROTECTED attribute
of symbols that are indirectly use-associated in a submodule.
Joseph Myers [Wed, 20 Nov 2024 21:29:48 +0000 (21:29 +0000)]
c: Diagnose compound literal for empty array [PR114266]
As reported in bug 114266, GCC fails to pedwarn for a compound
literal, whose type is an array of unknown size, initialized with an
empty initializer. This case is disallowed by C23 (which doesn't have
zero-size objects); the case of a named object is diagnosed as
expected, but not that for compound literals. (Before C23, the
pedwarn for empty initializers sufficed.) Add a check for this
specific case with a pedwarn.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/114266
gcc/c/
* c-decl.cc (build_compound_literal): Diagnose array of unknown
size with empty initializer for C23.
gcc/testsuite/
* gcc.dg/c23-empty-init-4.c: New test.
Antoni Boucher [Thu, 18 Jan 2024 22:54:59 +0000 (17:54 -0500)]
libgccjit: Add support for creating temporary variables
gcc/jit/ChangeLog:
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_33): New ABI tag.
* docs/topics/functions.rst: Document gcc_jit_function_new_temp.
* jit-playback.cc (new_local): Add support for temporary
variables.
* jit-recording.cc (recording::function::new_temp): New method.
(recording::local::write_reproducer): Support temporary
variables.
* jit-recording.h (new_temp): New method.
* libgccjit.cc (gcc_jit_function_new_temp): New function.
* libgccjit.h (gcc_jit_function_new_temp): New function.
* libgccjit.map: New function.
gcc/testsuite/ChangeLog:
* jit.dg/all-non-failing-tests.h: Mention test-temp.c.
* jit.dg/test-temp.c: New test.
[PR116587][LRA]: Fix last chance reload pseudo allocation
On i686 PR116587 test compilation resulted in LRA failure to find
registers for a reload insn pseudo. The insn requires 6 regs for 4
reload insn pseudos where two of them require 2 regs each. But we
have only 5 free regs as sp is a fixed reg, bp is fixed because of
-fno-omit-frame-pointer, bx is assigned to pic_offset_table_pseudo
because of -fPIC. LRA spills pic_offset_table_pseudo as the last
chance approach to allocate registers to the reload pseudo. Although
it makes 2 free registers for the unallocated reload pseudo requiring
also 2 regs, the pseudo still can not be allocated as the 2 free regs
are disjoint. The patch spills all pseudos conflicting with the
unallocated reload pseudo including already allocated reload insn
pseudos, then standard LRA code allocates spilled pseudos requiring
more one register first and avoid situation of the disjoint regs for
reload pseudos requiring more one reg.
gcc/ChangeLog:
PR target/116587
* lra-assigns.cc (find_all_spills_for): Consider all pseudos whose
classes intersect given pseudo class.
gcc/testsuite/ChangeLog:
PR target/116587
* gcc.target/i386/pr116587.c: New test.
Antoni Boucher [Mon, 23 Jan 2023 22:21:15 +0000 (17:21 -0500)]
libgccjit: Add support for machine-dependent builtins
gcc/jit/ChangeLog:
PR jit/108762
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_32): New ABI tag.
* docs/topics/functions.rst: Add documentation for the function
gcc_jit_context_get_target_builtin_function.
* dummy-frontend.cc: Include headers target.h, jit-recording.h,
print-tree.h, unordered_map and string, new variables (target_builtins,
target_function_types, and target_builtins_ctxt), new function
(tree_type_to_jit_type).
* jit-builtins.cc: Specify that the function types are not from
target builtins.
* jit-playback.cc: New argument is_target_builtin to new_function.
* jit-playback.h: New argument is_target_builtin to
new_function.
* jit-recording.cc: New argument is_target_builtin to
new_function_type, function_type constructor and function
constructor, new function
(get_target_builtin_function).
* jit-recording.h: Include headers string and unordered_map, new
variable target_function_types, new argument is_target_builtin
to new_function_type, function_type and function, new functions
(get_target_builtin_function, copy).
* libgccjit.cc: New function
(gcc_jit_context_get_target_builtin_function).
* libgccjit.h: New function
(gcc_jit_context_get_target_builtin_function).
* libgccjit.map: New functions
(gcc_jit_context_get_target_builtin_function).
gcc/testsuite:
PR jit/108762
* jit.dg/all-non-failing-tests.h: New test test-target-builtins.c.
* jit.dg/test-target-builtins.c: New test.
Andrew Pinski [Wed, 20 Nov 2024 03:49:38 +0000 (19:49 -0800)]
aarch64: Fix aarch64 after moving to C23
This fixes a few aarch64 specific testcases after the move to default to GNU C23.
For the SME testcases, the GNU C23 cases as `()` changing to mean `(void)` instead
of a non-prototype declaration; the non-prototype declaration merging was confusing
some of the time so the updated way is the expected way even for that.
For pic-*.c `-Wno-old-style-definition` was added not to warn about old style definitions.
For pr113573.c, I added `-std=gnu17` since I was not sure if `(...)` with C23 would invoke
the same issue.
Andrew Pinski [Wed, 20 Nov 2024 07:45:20 +0000 (23:45 -0800)]
rtl-reader: Disable reuse_rtx support for generator building
reuse_rtx is not documented nor the format to use it is ever documented.
So it should not be supported for the .md files.
This also fixes the problem if an invalid index is supplied for reuse_rtx,
instead of ICEing, put out a real error message. Note since this code
still uses atoi, an invalid index can still be used in some cases but that is
recorded as part of PR 44574.
Note I did a grep of the sources to make sure that this was only used for
the read rtl in the GCC rather than while reading in .md files.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* read-md.h (class rtx_reader): Don't include m_reuse_rtx_by_id
when GENERATOR_FILE is defined.
* read-rtl.cc (rtx_reader::read_rtx_code): Disable reuse_rtx
support when GENERATOR_FILE is defined.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Edwin Lu [Tue, 19 Nov 2024 20:55:15 +0000 (12:55 -0800)]
RISC-V: testsuite: restrict big endian test to non vector
RISC-V vector currently does not support big endian so the postcommit
was getting the sorry, not implemented error on vector targets. Restrict
the testcase to non-vector targets
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117595.c: Restrict to non vector targets.
Richard Biener [Wed, 20 Nov 2024 15:47:08 +0000 (16:47 +0100)]
tree-optimization/117709 - bogus offset for gather load
When diverting to VMAT_GATHER_SCATTER we fail to zero *poffset
which was previously set if a load was classified as
VMAT_CONTIGUOUS_REVERSE. The following refactors
get_group_load_store_type a bit to avoid this but this all needs
some serious TLC.
PR tree-optimization/117709
* tree-vect-stmts.cc (get_group_load_store_type): Only
set *poffset when we end up with VMAT_CONTIGUOUS_DOWN
or VMAT_CONTIGUOUS_REVERSE.
Richard Biener [Wed, 20 Nov 2024 12:32:48 +0000 (13:32 +0100)]
tree-optimization/117698 - SLP vectorization and alignment
When SLP vectorizing we fail to mark the general alignment check
as irrelevant when using VMAT_STRIDED_SLP (the implementation checks
for itself) and when VMAT_INVARIANT the override isn't effective.
This results in extra FAILs on sparc which the following fixes.
PR tree-optimization/117698
* tree-vect-stmts.cc (get_group_load_store_type): Properly
disregard alignment for VMAT_STRIDED_SLP and VMAT_INVARIANT.
(vectorizable_load): Adjust guard for dumping whether we
vectorize and unaligned access.
(vectorizable_store): Likewise.
Antoni Boucher [Thu, 15 Feb 2024 22:03:22 +0000 (17:03 -0500)]
libgccjit: Add option to allow special characters in function names
gcc/jit/ChangeLog:
* docs/topics/contexts.rst: Add documentation for new option.
* jit-recording.cc (recording::context::get_str_option): New
method.
* jit-recording.h (get_str_option): New method.
* libgccjit.cc (gcc_jit_context_new_function): Allow special
characters in function names.
* libgccjit.h (enum gcc_jit_str_option): New option.
OpenMP: common C/C++ testcases for dispatch + adjust_args
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives.
* c-c++-common/gomp/adjust-args-1.c: New test.
* c-c++-common/gomp/adjust-args-2.c: New test.
* c-c++-common/gomp/declare-variant-dup-match-clause.c: New test.
* c-c++-common/gomp/dispatch-1.c: New test.
* c-c++-common/gomp/dispatch-2.c: New test.
* c-c++-common/gomp/dispatch-3.c: New test.
* c-c++-common/gomp/dispatch-4.c: New test.
* c-c++-common/gomp/dispatch-5.c: New test.
* c-c++-common/gomp/dispatch-6.c: New test.
* c-c++-common/gomp/dispatch-7.c: New test.
* c-c++-common/gomp/dispatch-8.c: New test.
* c-c++-common/gomp/dispatch-9.c: New test.
* c-c++-common/gomp/dispatch-10.c: New test.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/dispatch-1.c: New test.
* testsuite/libgomp.c-c++-common/dispatch-2.c: New test.
OpenMP: C++ front-end support for dispatch + adjust_args
This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.
Additional C/C++ common testcases are provided in a subsequent patch in the
series.
gcc/cp/ChangeLog:
* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add 'tree parms' parameter. Update
call to cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.
(tsubst_stmt): Handle OMP_DISPATCH.
(tsubst_expr): Handle IFN_GOMP_DISPATCH.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/adjust-args-3.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
* g++.dg/gomp/dispatch-3.C: New test.
* g++.dg/gomp/dispatch-4.C: New test.
* g++.dg/gomp/dispatch-5.C: New test.
* g++.dg/gomp/dispatch-6.C: New test.
* g++.dg/gomp/dispatch-7.C: New test.
OpenMP: C front-end support for dispatch + adjust_args
This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.
Additional common C/C++ testcases are in a later patch in the series.
* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.
* gcc.dg/gomp/dispatch-2.c: New test.
* gcc.dg/gomp/dispatch-3.c: New test.
* gcc.dg/gomp/dispatch-4.c: New test.
* gcc.dg/gomp/dispatch-5.c: New test.
OpenMP: middle-end support for dispatch + adjust_args
This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `omp_get_mapped_ptr` for the adequate device.
For dispatch, the following steps are performed:
* Handle the device clause, if any: set the default-device ICV at the top of the
dispatch region and restore its previous value at the end.
* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.
OpenMP: dispatch + adjust_args tree data structures and front-end interfaces
This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.
gcc/ChangeLog:
* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.