bpf: Always emit .BTF.ext section if generating BTF
BPF applications, when generating BTF information should always create a
.BTF.ext section.
Current implementation was only creating it when -mco-re option was used.
This patch makes .BTF.ext always be generated for BPF target objects.
The patch also adds conditions around btf_finalize function call
such that BTF deallocation happens later for BPF target.
For BPF, btf_finalize is only called after .BTF.ext is generated.
gcc/ChangeLog:
* config/bpf/bpf.cc (bpf_option_override): Make .BTF.ext
enabled by default for BPF.
(bpf_file_end): Call BTF deallocation.
(bpf_asm_init_sections): Correct condition.
* dwarf2ctf.cc (ctf_debug_finalize): Conditionally execute BTF
deallocation.
(ctf_debuf_finish): Correct condition for calling
ctf_debug_finalize.
This patch corrects the addition of +1 on the type id, which originally
was done in the wrong location and led to func_dtd->dtd_type for
BTF_KIND_FUNC struct data to contain the type id of the previous entry.
gcc/ChangeLog:
* btfout.cc (btf_collect_dataset): Corrects BTF type id.
Richard Biener [Wed, 28 Feb 2024 11:37:07 +0000 (12:37 +0100)]
tree-optimization/114121 - wrong VN with context sensitive range info
When VN ends up exploiting range-info specifying the ao_ref offset
and max_size we have to make sure to reflect this in the hashtable
entry for the recorded expression. The PR113831 fix handled the
case where we can encode this in the operands themselves but this
bug shows the issue is more widespread.
So instead of altering the operands the following instead records
this extra info that's possibly used, only throwing it away when
the value-numbering didn't come up with a non-VARYING value which
is an important detail to preserve CSE as opposed to constant
folding which is where all cases currently known popped up.
With this the original PR113831 fix can be reverted.
PR tree-optimization/114121
* tree-ssa-sccvn.h (vn_reference_s::offset,
vn_reference_s::max_size): New fields.
(vn_reference_insert_pieces): Adjust prototype.
* tree-ssa-pre.cc (phi_translate_1): Preserve offset/max_size.
* tree-ssa-sccvn.cc (vn_reference_eq): Compare offset and
size, allow using "don't know" state.
(vn_walk_cb_data::finish): Pass along offset/max_size.
(vn_reference_lookup_or_insert_for_pieces): Take offset and
max_size as argument and use it.
(vn_reference_lookup_3): Properly adjust offset and max_size
according to the adjusted ao_ref.
(vn_reference_lookup_pieces): Initialize offset and max_size.
(vn_reference_lookup): Likewise.
(vn_reference_lookup_call): Likewise.
(vn_reference_insert): Likewise.
(visit_reference_op_call): Likewise.
(vn_reference_insert_pieces): Take offset and max_size
as argument and use it.
Jonathan Wakely [Thu, 22 Feb 2024 13:06:59 +0000 (13:06 +0000)]
libstdc++: Test error handling in std::print
The standard requires an exception if std::print fails to write to a
FILE*. When writing to a std::ostream, failure to format the arguments
doesn't affect the stream state, but failure to write to the streadm
sets badbit.
libstdc++-v3/ChangeLog:
* testsuite/27_io/basic_ostream/print/1.cc: Check error
handling.
* testsuite/27_io/print/1.cc: Likewise.
Jakub Jelinek [Wed, 28 Feb 2024 11:09:04 +0000 (12:09 +0100)]
testsuite: XFAIL ssa-sink-18.c also on powerpc64 [PR111462]
powerpc64-linux apparently (not very surprisingly) behaves the same
way as powerpc64le-linux and has 4 sunk statements rather than 5,
so we should xfail it on powerpc64*-*-* rather than just powerpc64le-*-*.
powerpc-linux has 3 sunk statements, but the scan pattern is done for
lp64 only as the comment explains.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
PR testsuite/111462
* gcc.dg/tree-ssa/ssa-sink-18.c: XFAIL also on powerpc64.
Juergen Christ [Mon, 19 Feb 2024 09:10:35 +0000 (10:10 +0100)]
Only emulate integral vectors.
The emulation via word mode tries to perform integer arithmetic on floating
point values instead of floating point arithmetic. This leads to
mis-compilations.
Failure occured on s390x on these existing test cases:
gcc.dg/vect/tsvc/vect-tsvc-s112.c
gcc.dg/vect/tsvc/vect-tsvc-s113.c
gcc.dg/vect/tsvc/vect-tsvc-s119.c
gcc.dg/vect/tsvc/vect-tsvc-s121.c
gcc.dg/vect/tsvc/vect-tsvc-s131.c
gcc.dg/vect/tsvc/vect-tsvc-s132.c
gcc.dg/vect/tsvc/vect-tsvc-s2233.c
gcc.dg/vect/tsvc/vect-tsvc-s421.c
gcc.dg/vect/vect-alias-check-14.c
gcc.target/s390/vector/partial/s390-vec-length-epil-run-1.c
gcc.target/s390/vector/partial/s390-vec-length-epil-run-3.c
gcc.target/s390/vector/partial/s390-vec-length-full-run-3.c
gcc/ChangeLog:
PR tree-optimization/114075
* tree-vect-stmts.cc (vectorizable_operation): Don't emulate floating
point vectors
Jakub Jelinek [Wed, 28 Feb 2024 08:59:45 +0000 (09:59 +0100)]
graphite: Fix non-INTEGER_TYPE integral comparison handling [PR114041]
The following testcases are miscompiled, because graphite ignores boolean,
enumerated or _BitInt comparisons, rewrites the code as if the comparisons
were always true or always false.
The INTEGER_TYPE checks were initially added in r6-2239 but at that point
it was both in add_conditions_to_domain and in parameter_index_in_region.
Later on the check was also added to stmt_simple_for_scop_p, and finally
r8-3931 changed the stmt_simple_for_scop_p check to INTEGRAL_TYPE_P
and turned the parameter_index_in_region -> assign_parameter_index_in_region
into INTEGRAL_TYPE_P assertion, but the add_conditions_to_domain check
for INTEGER_TYPE remained.
The following patch uses INTEGRAL_TYPE_P to complete the change.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114041
* graphite-sese-to-poly.cc (add_conditions_to_domain): Check for
INTEGRAL_TYPE_P check rather than INTEGER_TYPE.
* gcc.dg/graphite/run-id-pr114041-1.c: New test.
* gcc.dg/graphite/run-id-pr114041-2.c: New test.
Jakub Jelinek [Wed, 28 Feb 2024 08:40:15 +0000 (09:40 +0100)]
gimple-fold: Use bitwise vector types rather than barely supported huge integral types in memcpy etc. folding [PR113988]
The following patch changes the memcpy etc. folding to use bitwise vector
types rather than huge INTEGER_TYPEs for copying of > MAX_FIXED_MODE_SIZE
lengths. The problem with the huge INTEGER_TYPEs is that they aren't
supported very much, usually there are just optabs to handle moves of them,
perhaps misaligned moves and that is it, so they pose problems e.g. to
BITINT_TYPE lowering.
2024-02-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113988
* stor-layout.h (bitwise_mode_for_size): Declare.
* stor-layout.cc (bitwise_mode_for_size): New function.
* gimple-fold.cc (gimple_fold_builtin_memory_op): Use it.
Use bitwise_type_for_mode instead of build_nonstandard_integer_type.
Use BITS_PER_UNIT instead of 8.
Jakub Jelinek [Wed, 28 Feb 2024 08:26:51 +0000 (09:26 +0100)]
testsuite: Add c23-stdarg-4.c test variant where all functions return large struct
I think we have no coverage for the case where structure_value_addr_parm and
TYPE_NO_NAMED_ARGS_STDARG_P are both true. The
if (type_arg_types != 0)
n_named_args
= (list_length (type_arg_types)
/* Count the struct value address, if it is passed as a parm. */
+ structure_value_addr_parm);
else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
n_named_args = 0;
else
/* If we know nothing, treat all args as named. */
n_named_args = num_actuals;
code should probably have n_named_args = structure_value_addr_parm;
instead of n_named_args = 0;, this testcase is an attempt to see if
it is broken on any target.
Nathaniel Shead [Wed, 28 Feb 2024 00:20:53 +0000 (11:20 +1100)]
c++: Revert deferring emission of inline variables [PR114013]
This is a (partial) reversion of r14-8987-gdd9d14f7d53 to return to
eagerly emitting inline variables to the middle-end when they are
declared. 'import_export_decl' will still continue to accept them, as
allowing this is a pure extension and doesn't seem to cause issues with
modules, but otherwise deferring the emission of inline variables
appears to cause issues on some targets and prevents some code using
inline variable templates from correctly linking.
There might be a more targetted way to support this, but due to the
complexity of handling linkage and emission I'd prefer to wait till
GCC 15 to explore our options.
David Malcolm [Tue, 27 Feb 2024 19:49:42 +0000 (14:49 -0500)]
analyzer: use correct format code for string literal indices [PR110483,PR111802]
On e.g. gcc211 the use of "%li" with unsigned HOST_WIDE_INT led to this warning:
../../src/gcc/analyzer/access-diagram.cc: In member function ‘void ana::string_literal_spatial_item::add_column_for_byte(text_art::table&, const ana::bit_to_table_map&, text_art::style_manager&, ana::byte_offset_t, ana::byte_offset_t, int, int) const’:
../../src/gcc/analyzer/access-diagram.cc:1909:40: warning: format ‘%li’ expects argument of type ‘long int’, but argument 3 has type ‘long long unsigned int’ [-Wformat=]
byte_idx_within_string.ulow ()));
^
and to all values being erroneously printed as "0".
Fixed thusly.
gcc/analyzer/ChangeLog:
PR analyzer/110483
PR analyzer/111802
* access-diagram.cc
(string_literal_spatial_item::add_column_for_byte): Use %wu for
printing unsigned HOST_WIDE_INT.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Eric Botcazou [Tue, 27 Feb 2024 17:01:00 +0000 (18:01 +0100)]
Fix internal error on non-byte-aligned reference in GIMPLE DSE
This is a regression present on the mainline, 13 and 12 branches. For the
attached Ada case, it's a tree checking failure on the mainline at -O:
+===========================GNAT BUG DETECTED==============================+
| 14.0.1 20240226 (experimental) [master r14-9171-g4972f97a265] GCC error:|
| tree check: expected tree that contains 'decl common' structure, |
| have 'component_ref' in tree_could_trap_p, at tree-eh.cc:2733 |
| Error detected around /home/eric/cvs/gcc/gcc/testsuite/gnat.dg/opt104.adb:
Time is a 10-byte record and Packed_Rec.T is placed at bit-offset 65 because
of the packing. so tree-ssa-dse.cc:setup_live_bytes_from_ref has computed a
const_size of 88 from ref->offset of 65 and ref->max_size of 80.
Then in tree-ssa-dse.cc:compute_trims:
411 int last_live = bitmap_last_set_bit (live);
(gdb) next
412 if (ref->size.is_constant (&const_size))
(gdb)
414 int last_orig = (const_size / BITS_PER_UNIT) - 1;
(gdb)
418 *trim_tail = last_orig - last_live;
(gdb) call debug_bitmap (live)
n_bits = 256, set = {0 1 2 3 4 5 6 7 8 9 10 }
(gdb) p last_live
$33 = 10
(gdb) p const_size
$34 = 80
(gdb) p last_orig
$35 = 9
(gdb) p *trim_tail
$36 = -1
In other words, compute_trims is overlooking the alignment adjustments that
setup_live_bytes_from_ref applied earlier. Moveover it reads:
/* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
extends through ref->size. So we know that in the original bitmap
bits 0..ref->size were true. We don't actually need the bitmap, just
the REF to compute the trims. */
but setup_live_bytes_from_ref used ref->max_size instead of ref->size.
It appears that all the callers of compute_trims assume that ref->offset is
byte aligned and that the trimmed bytes are relative to ref->size, so the
patch simply adds an early return if either condition is not fulfilled.
gcc/
* tree-ssa-dse.cc (compute_trims): Fix description. Return early
if either ref->offset is not byte aligned or ref->size is not known
to be equal to ref->max_size.
(maybe_trim_complex_store): Fix description.
(maybe_trim_constructor_store): Likewise.
(maybe_trim_partially_dead_store): Likewise.
gcc/testsuite/
* gnat.dg/opt104.ads, gnat.dg/opt104.adb: New test.
These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)
libgomp/ChangeLog:
* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
routines that simply map to their C counterpart.
* openacc.f90 (openacc): Add them.
* openacc_lib.h: Likewise.
* testsuite/libgomp.oacc-fortran/acc_host_device_ptr.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.
* testsuite/libgomp.oacc-fortran/acc-memcpy-2.f90: New test.
* testsuite/libgomp.oacc-c-c++-common/lib-59.c: Crossref to f90 test.
* testsuite/libgomp.oacc-c-c++-common/lib-60.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-95.c: Likewise.
David Malcolm [Tue, 27 Feb 2024 13:36:58 +0000 (08:36 -0500)]
analyzer: fix ICE on floating-point bounds [PR111881]
gcc/analyzer/ChangeLog:
PR analyzer/111881
* constraint-manager.cc (bound::ensure_closed): Assert that
m_constant has integral type.
(range::add_bound): Bail out on floating point constants.
gcc/testsuite/ChangeLog:
PR analyzer/111881
* c-c++-common/analyzer/conditionals-pr111881.c: New test.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Richard Earnshaw [Mon, 26 Feb 2024 17:20:58 +0000 (17:20 +0000)]
arm: warn about deprecation of iwmmx in mmintrin.h
GCC 13's changes file documents that iwmmx is deprecated. Raise the bar
by warning when the mmintrin.h header is included by users, but provide
a way to suppress the warning.
gcc:
* config/arm/mmintrin.h: Warn if this header is included without
defining __ENABLE_DEPRECATED_IWMMXT.
Richard Biener [Mon, 26 Feb 2024 12:33:21 +0000 (13:33 +0100)]
tree-optimization/114074 - CHREC multiplication and undefined overflow
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior. The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.
I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p and
get_range_pos_neg isn't a good fit.
PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.
* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect
loop to be vectorized.
Jakub Jelinek [Tue, 27 Feb 2024 08:52:07 +0000 (09:52 +0100)]
expand: Add trivial folding for bit query builtins at expansion time [PR114044]
While it seems a lot of places in various optimization passes fold
bit query internal functions with INTEGER_CST arguments to INTEGER_CST
when there is a lhs, when lhs is missing, all the removals of such dead
stmts are guarded with -ftree-dce, so with -fno-tree-dce those unfolded
ifn calls remain in the IL until expansion. If they have large/huge
BITINT_TYPE arguments, there is no BLKmode optab and so expansion ICEs,
and bitint lowering doesn't touch such calls because it doesn't know they
need touching, functions only containing those will not even be further
processed by the pass because there are no non-small BITINT_TYPE SSA_NAMEs
+ the 2 exceptions (stores of BITINT_TYPE INTEGER_CSTs and conversions
from BITINT_TYPE INTEGER_CSTs to floating point SSA_NAMEs) and when walking
there is no special case for calls with BITINT_TYPE INTEGER_CSTs either,
those are for normal calls normally handled at expansion time.
So, the following patch adjust the expansion of these 6 ifns, by doing
nothing if there is no lhs, and also just in case and user disabled all
possible passes that would fold this handles the case of setting lhs
to ifn call with INTEGER_CST argument.
2024-02-27 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114044
* internal-fn.def (CLRSB, CLZ, CTZ, FFS, PARITY): Use
DEF_INTERNAL_INT_EXT_FN macro rather than DEF_INTERNAL_INT_FN.
* internal-fn.h (expand_CLRSB, expand_CLZ, expand_CTZ, expand_FFS,
expand_PARITY): Declare.
* internal-fn.cc (expand_bitquery, expand_CLRSB, expand_CLZ,
expand_CTZ, expand_FFS, expand_PARITY): New functions.
(expand_POPCOUNT): Use expand_bitquery.
Jakub Jelinek [Mon, 26 Feb 2024 16:55:07 +0000 (17:55 +0100)]
varasm: Handle private COMDAT function symbol reference in readonly data section [PR113617]
If default_elf_select_rtx_section is called to put a reference to some
local symbol defined in a comdat section into memory, which happens more often
since the r14-4944 RA change, linking might fail.
default_elf_select_rtx_section puts such constants into .data.rel.ro.local
etc. sections and if linker chooses comdat sections from some other TU
and discards the one to which a relocation in .data.rel.ro.local remains,
linker diagnoses error. References to private comdat symbols can only appear
from functions or data objects in the same comdat group, so the following
patch arranges using .data.rel.ro.local.pool.<comdat_name> and similar sections.
2024-02-26 Jakub Jelinek <jakub@redhat.com>
H.J. Lu <hjl.tools@gmail.com>
PR rtl-optimization/113617
* varasm.cc (default_elf_select_rtx_section): For
references to private symbols in comdat sections
use .data.relro.local.pool.<comdat>, .data.relro.pool.<comdat>
or .rodata.<comdat> comdat sections.
* g++.dg/other/pr113617.C: New test.
* g++.dg/other/pr113617.h: New test.
* g++.dg/other/pr113617-aux.cc: New test.
Jakub Jelinek [Mon, 26 Feb 2024 15:30:16 +0000 (16:30 +0100)]
c: Improve some diagnostics for __builtin_stdc_bit_* [PR114042]
The PR complains that for the __builtin_stdc_bit_* "builtins" the
diagnostics doesn't mention the name of the builtin the user used, but
instead __builtin_{clz,ctz,popcount}g instead (which is what the FE
immediately lowers it to).
The following patch repeats the checks from check_builtin_function_arguments
which are there done on BUILT_IN_{CLZ,CTZ,POPCOUNT}G, such that they
diagnose it with the name of the "builtin" user actually used before it
is gone.
2024-02-26 Jakub Jelinek <jakub@redhat.com>
PR c/114042
* c-parser.cc (c_parser_postfix_expression): Diagnose
__builtin_stdc_bit_* argument with ENUMERAL_TYPE or BOOLEAN_TYPE
type or if signed here rather than on the replacement builtins
in check_builtin_function_arguments.
* gcc.dg/builtin-stdc-bit-2.c: Adjust testcase for actual builtin
names rather than names of builtin replacements.
Richard Biener [Mon, 26 Feb 2024 11:27:42 +0000 (12:27 +0100)]
tree-optimization/114099 - virtual LC PHIs and early exit vect
In some cases exits can lack LC PHI nodes for the virtual operand.
We have to create them when the epilog loop requires them which also
allows us to remove some only halfway correct fixups. This is the
variant triggering for alternate exits.
PR tree-optimization/114099
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Create and fill in a needed virtual LC PHI for the alternate
exits. Remove code dealing with that missing.
* gcc.dg/vect/vect-early-break_120-pr114099.c: New testcase.
Richard Biener [Mon, 26 Feb 2024 10:25:50 +0000 (11:25 +0100)]
tree-optimization/114068 - missed virtual LC PHI after vect peeling
When we choose the IV exit to be one leading to no virtual use we
fail to have a virtual LC PHI even though we need it for the epilog
entry. The following makes sure to create it so that later updating
works.
PR tree-optimization/114068
* tree-vect-loop-manip.cc (get_live_virtual_operand_on_edge):
New function.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add a virtual LC PHI
on the main exit if needed. Remove band-aid for the case
it was missing.
* gcc.dg/vect/vect-early-break_118-pr114068.c: New testcase.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Likewise.
Eric Botcazou [Mon, 26 Feb 2024 12:13:34 +0000 (13:13 +0100)]
Finalization of object allocated by anonymous access designating local type
The finalization of objects dynamically allocated through an anonymous
access type is deferred to the enclosing library unit in the current
implementation and a warning is given on each of them.
However this cannot be done if the designated type is local, because this
would generate dangling references to the local finalization routine, so
the finalization needs to be dropped in this case and the warning adjusted.
gcc/ada/
PR ada/113893
* exp_ch7.adb (Build_Anonymous_Master): Do not build the master
for a local designated type.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Force Needs_Fin
to false if no finalization master is attached to an access type
and assert that it is anonymous in this case.
* sem_res.adb (Resolve_Allocator): Mention that the object might
not be finalized at all in the warning given when the type is an
anonymous access-to-controlled type.
H.J. Lu [Sun, 25 Feb 2024 21:14:39 +0000 (13:14 -0800)]
x86: Check interrupt instead of noreturn attribute
ix86_set_func_type checks noreturn attribute to avoid incompatible
attribute error in LTO1 on interrupt functions. Since TREE_THIS_VOLATILE
is set also for _Noreturn without noreturn attribute, check interrupt
attribute for interrupt functions instead.
Jakub Jelinek [Mon, 26 Feb 2024 10:12:39 +0000 (11:12 +0100)]
i386: Enable _BitInt support on ia32
Given the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113837#c9
comment, the following patch just attempts to implement what I think
is best for ia32.
Compared to https://gitlab.com/x86-psABIs/i386-ABI/-/issues/5 ,
like that patch for _BitInt(64) or smaller it uses the smallest containing
{,un}signed {char,short,int,long long} for passing/returning and
layout of variables including in structures for alignment/size, with any
extra bits unspecified.
Unlike the above proposal, for larger _BitInt (i.e. _BitInt(65)+), it uses
passing/returning/layout/alignment of structure containing minimum needed
number of 32-bit limbs, again with the extra bits unspecified.
This is because most operations (except copy or bitwise ops) on _BitInts
aren't really vectorizable and will be under the hood implemented in loops
over 32-bit limbs anyway (using 64-bit limbs under the hood would mean
often using library implementation for the basic operations) and because
ia32 doesn't align even long long/double in structures to 64-bit I think
it is better to just use 32-bit alignment for that. And I don't see
a reason to waste 32-bit bits say for _BitInt(224) or _BitInt(288) on ia32.
So, effectively it is like the x86-64 _BitInt ABI with everything divided by
2, the only exception is that in x86-64 psABI _BitInt(128) is said to be
already a structure of 2 limbs, which happens to be passed mostly the same
as __int128 (except for alignment).
2024-02-26 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.cc (ix86_bitint_type_info): Add support for
!TARGET_64BIT.
Jakub Jelinek [Mon, 26 Feb 2024 09:08:45 +0000 (10:08 +0100)]
match.pd: Guard 2 simplifications on integral TYPE_OVERFLOW_UNDEFINED [PR114090]
These 2 patterns are incorrect on floating point, or for -fwrapv, or
for -ftrapv, or the first one for unsigned types (the second one is
mathematically correct, but we ought to just fold that to 0 instead).
So, the following patch properly guards this.
I think we don't need && !TYPE_OVERFLOW_SANITIZED (type) because
in both simplifications there would be UB before and after on
signed integer minimum.
Jakub Jelinek [Mon, 26 Feb 2024 09:07:39 +0000 (10:07 +0100)]
fold-const: Avoid infinite recursion in +-*&|^minmax reassociation [PR114084]
In the following testcase we infinitely recurse during BIT_IOR_EXPR
reassociation.
One operand is (unsigned _BitInt(31)) a << 4 and another operand 2147483647 >> 1 | 80 where both the right shift and the | 80
trees have TREE_CONSTANT set, but weren't folded because of delayed
folding, where some foldings are apparently done even in that case
unfortunately.
Now, the fold_binary_loc reassocation code splits both operands into
variable part, minus variable part, constant part, minus constant part,
literal part and minus literal parts, to prevent infinite recursion
punts if there are just 2 parts altogether from the 2 operands and then goes
on with reassociation, merges first the corresponding parts from both
operands and then some further merges.
The problem with the above expressions is that we get 3 different objects,
var0 (the left shift), con1 (the right shift) and lit1 (80), so the infinite
recursion prevention doesn't trigger, and we eventually merge con1 with
lit1, which effectively reconstructs the original op1 and then associate
that with var0 which is original op0, and associate_trees for that case
calls fold_binary. There are some casts involved there too (the T typedef
type and the underlying _BitInt type which are stripped with STRIP_NOPS).
The following patch attempts to prevent this infinite recursion by tracking
the origin (if certain var comes from nothing - 0, op0 - 1, op1 - 2 or both - 3)
and propagates it through all the associate_tree calls which merge the vars.
If near the end we'd try to merge what comes solely from op0 with what comes
solely from op1 (or vice versa), the patch punts, because then it isn't any
kind of reassociation between the two operands, if anything it should be
handled when folding the suboperands.
2024-02-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114084
* fold-const.cc (fold_binary_loc): Avoid the final associate_trees
if all subtrees of var0 come from one of the op0 or op1 operands
and all subtrees of con0 come from the other one. Don't clear
variables which are never used afterwards.
The following properly guards the simplifications that move
operations into VEC_CONDs, in particular when that changes the
type constraints on this operation.
This needed a genmatch fix which was recording spurious implicit fors
when tcc_comparison is used in a C expression.
PR middle-end/114070
* genmatch.cc (parser::parse_c_expr): Do not record operand
lists but only mark operators used.
* match.pd ((c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e)):
Properly guard the case of tcc_comparison changing the VEC_COND
value operand type.
Jakub Jelinek [Mon, 26 Feb 2024 06:30:05 +0000 (07:30 +0100)]
i386: Fix up x86_function_profiler -masm=intel support [PR114094]
In my r14-8214 changes I apparently forgot one \n at the end of an instruction.
The corresponding AT&T line looks like:
"1:\tcall\t*%s@GOTPCREL(%%rip)\n"
but the Intel variant was
"1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]"
and the memory operand size is 1 byte. As the result, the rest of 511
bytes is ignored by GCC. Implement ldtilecfg and sttilecfg intrinsics
with a pointer to XImode to honor the 512-byte memory block.
gcc/ChangeLog:
PR target/114098
* config/i386/amxtileintrin.h (_tile_loadconfig): Use
__builtin_ia32_ldtilecfg.
(_tile_storeconfig): Use __builtin_ia32_sttilecfg.
* config/i386/i386-builtin.def (BDESC): Add
__builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
* config/i386/i386-expand.cc (ix86_expand_builtin): Handle
IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
* config/i386/i386.md (ldtilecfg): New pattern.
(sttilecfg): Likewise.
gcc/testsuite/ChangeLog:
PR target/114098
* gcc.target/i386/amxtile-4.c: New test.
Jerry DeLisle [Sun, 25 Feb 2024 22:50:07 +0000 (14:50 -0800)]
libgfortran: Propagate user defined iostat and iomsg.
PR libfortran/105456
libgfortran/ChangeLog:
* io/list_read.c (list_formatted_read_scalar): Add checks
for the case where a user defines their own error codes
and error messages and generate the runtime error.
* io/transfer.c (st_read_done): Whitespace.
Gaius Mulley [Sun, 25 Feb 2024 11:08:37 +0000 (11:08 +0000)]
PR modula2/113749 m2 enabled build times out on i686-gnu-hurd
The bug fix changes the FIO module to use the target O_RDONLY,
O_WRONLY, SEEK_SET and SEEK_END (now available from the module wrapc).
Also rebuilt are the bootstrap tools mc and pge as they have their
own wrapc and C translations of FIO.
vect: Tighten check for impossible SLP layouts [PR113205]
During its forward pass, the SLP layout code tries to calculate
the cost of a layout change on an incoming edge. This is taken
as the minimum of two costs: one in which the source partition
keeps its current layout (chosen earlier during the pass) and
one in which the source partition switches to the new layout.
The latter can sometimes be arranged by the backward pass.
If only one of the costs is valid, the other cost was ignored.
But the PR shows that this is not safe. If the source partition
has layout 0 (the normal layout), we have to be prepared to handle
the case in which that ends up being the only valid layout.
Other code already accounts for this restriction, e.g. see
the code starting with:
/* Reject the layout if it would make layout 0 impossible
for later partitions. This amounts to testing that the
target supports reversing the layout change on edges
to later partitions.
gcc/
PR tree-optimization/113205
* tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject
the proposed layout if it does not allow a source partition with
layout 2 to keep that layout.
gcc/testsuite/
PR tree-optimization/113205
* gcc.dg/torture/pr113205.c: New test.
Jakub Jelinek [Sat, 24 Feb 2024 11:45:40 +0000 (12:45 +0100)]
Use HOST_WIDE_INT_{C,UC,0,0U,1,1U} macros some more
I've searched for some uses of (HOST_WIDE_INT) constant or (unsigned
HOST_WIDE_INT) constant and turned them into uses of the appropriate
macros.
THere are quite a few cases in non-i386 backends but I've left that out
for now.
The only behavior change is in build_replicated_int_cst where the
left shift was done in HOST_WIDE_INT type but assigned to unsigned
HOST_WIDE_INT, which I've changed into unsigned HOST_WIDE_INT shift.
2024-02-24 Jakub Jelinek <jakub@redhat.com>
gcc/
* builtins.cc (fold_builtin_isascii): Use HOST_WIDE_INT_UC macro.
* combine.cc (make_field_assignment): Use HOST_WIDE_INT_1U macro.
* double-int.cc (double_int::mask): Use HOST_WIDE_INT_UC macros.
* genattrtab.cc (attr_alt_complement): Use HOST_WIDE_INT_1 macro.
(mk_attr_alt): Use HOST_WIDE_INT_0 macro.
* genautomata.cc (bitmap_set_bit, CLEAR_BIT): Use HOST_WIDE_INT_1
macros.
* ipa-strub.cc (can_strub_internally_p): Use HOST_WIDE_INT_1 macro.
* loop-iv.cc (implies_p): Use HOST_WIDE_INT_1U macro.
* pretty-print.cc (test_pp_format): Use HOST_WIDE_INT_C and
HOST_WIDE_INT_UC macros.
* rtlanal.cc (nonzero_bits1): Use HOST_WIDE_INT_UC macro.
* tree.cc (build_replicated_int_cst): Use HOST_WIDE_INT_1U macro.
* tree.h (DECL_OFFSET_ALIGN): Use HOST_WIDE_INT_1U macro.
* tree-ssa-structalias.cc (dump_varinfo): Use ~HOST_WIDE_INT_0U
macros.
* wide-int.cc (divmod_internal_2): Use HOST_WIDE_INT_1U macro.
* config/i386/constraints.md (define_constraint "L"): Use
HOST_WIDE_INT_C macro.
* config/i386/i386.md (movabsq split peephole2): Use HOST_WIDE_INT_C
macro.
(movl + movb peephole2): Likewise.
* config/i386/predicates.md (x86_64_zext_immediate_operand): Likewise.
(const_32bit_mask): Likewise.
gcc/objc/
* objc-encoding.cc (encode_array): Use HOST_WIDE_INT_0 macros.
Jakub Jelinek [Sat, 24 Feb 2024 11:44:34 +0000 (12:44 +0100)]
bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and VECTOR/COMPLEX_TYPE etc. [PR114073]
The following patch implements support for VIEW_CONVERT_EXPRs from/to
large/huge _BitInt to/from vector or complex types or anything else but
integral/pointer types which doesn't need to live in memory.
2024-02-24 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114073
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle
VIEW_CONVERT_EXPRs between large/huge _BitInt and non-integer/pointer
types like vector or complex types.
(gimple_lower_bitint): Don't merge VIEW_CONVERT_EXPRs to non-integral
types. Fix up VIEW_CONVERT_EXPR handling. Allow merging
VIEW_CONVERT_EXPR from non-integral/pointer types with a store.
Steve Kargl [Fri, 23 Feb 2024 21:05:04 +0000 (22:05 +0100)]
Fortran: ALLOCATE statement, SOURCE/MOLD expressions with subrefs [PR114024]
PR fortran/114024
gcc/fortran/ChangeLog:
* trans-stmt.cc (gfc_trans_allocate): When a source expression has
substring references, part-refs, or %re/%im inquiries, wrap the
entity in parentheses to force evaluation of the expression.
gcc/testsuite/ChangeLog:
* gfortran.dg/allocate_with_source_27.f90: New test.
* gfortran.dg/allocate_with_source_28.f90: New test.
Robin Dapp [Thu, 22 Feb 2024 12:40:55 +0000 (13:40 +0100)]
RISC-V: Fix vec_init for simple sequences [PR114028].
For a vec_init (_a, _a, _a, _a) with _a of mode DImode we try to
construct a "superword" of two "_a"s. This only works for modes < Pmode
when we can "shift and or" both halves into one Pmode register.
This patch disallows the optimization for inner_mode == Pmode and emits
a simple broadcast in such a case.
gcc/ChangeLog:
PR target/114028
* config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p):
Return false if inner mode is already Pmode.
(rvv_builder::is_all_same_sequence): New function.
(expand_vec_init): Emit broadcast if sequence is all same.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr114028.c: New test.
Jakub Jelinek [Fri, 23 Feb 2024 17:55:12 +0000 (18:55 +0100)]
c++: Fix ICE due to folding a call to constructor on cdtor_returns_this arches (aka arm32) [PR113083]
When targetm.cxx.cdtor_returns_this () (aka on arm32 TARGET_AAPCS_BASED)
constructor is supposed to return this pointer, but when we cp_fold such
a call, we don't take that into account and just INIT_EXPR the object,
so we can later ICE during gimplification, because the expression doesn't
have the right type.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR c++/113083
* cp-gimplify.cc (cp_fold): For targetm.cxx.cdtor_returns_this ()
wrap r into a COMPOUND_EXPR and return folded CALL_EXPR_ARG (x, 0).
aarch64: Spread out FPR usage between RA regions [PR113613]
early-ra already had code to do regrename-style "broadening"
of the allocation, to promote scheduling freedom. However,
the pass divides the function into allocation regions
and this broadening only worked within a single region.
This meant that if a basic block contained one subblock
of FPR use, followed by a point at which no FPRs were live,
followed by another subblock of FPR use, the two subblocks
would tend to reuse the same registers. This in turn meant
that it wasn't possible to form LDP/STP pairs between them.
The failure to form LDPs and STPs in the testcase was a
regression from GCC 13.
The patch adds a simple heuristic to prefer less recently
used registers in the event of a tie.
gcc/
PR target/113613
* config/aarch64/aarch64-early-ra.cc
(early_ra::m_current_region): New member variable.
(early_ra::m_fpr_recency): Likewise.
(early_ra::start_new_region): Bump m_current_region.
(early_ra::allocate_colors): Prefer less recently used registers
in the event of a tie. Add a comment to explain why we prefer(ed)
higher-numbered registers.
(early_ra::find_oldest_color): Prefer less recently used registers
here too.
(early_ra::finalize_allocation): Update recency information for
allocated registers.
(early_ra::process_blocks): Initialize m_current_region and
m_fpr_recency.
gcc/testsuite/
PR target/113613
* gcc.target/aarch64/pr113613.c: New test.
aarch64: Tighten early-ra chain test for wide registers [PR113295]
Most code in early-ra used is_chain_candidate to check whether we
should chain two allocnos. This included both tests that matter
for correctness and tests for certain heuristics.
Once that test passes for one pair of allocnos, we test whether
it's safe to chain the containing groups (which might contain
multiple allocnos for x2, x3 and x4 modes). This test used an
inline test for correctness only, deliberately skipping the
heuristics. However, this instance of the test was missing
some handling of equivalent allocnos.
This patch fixes things by making is_chain_candidate take a
strictness parameter: correctness only, or correctness + heuristics.
It then makes the group-chaining test use the correctness version
rather than trying to replicate it inline.
gcc/
PR target/113295
* config/aarch64/aarch64-early-ra.cc
(early_ra::test_strictness): New enum.
(early_ra::is_chain_candidate): Add a strictness parameter to
control whether only correctness matters, or whether both correctness
and heuristics should be used. Handle multiple levels of equivalence.
(early_ra::find_related_start): Update call accordingly.
(early_ra::strided_polarity_pref): Likewise.
(early_ra::form_chains): Likewise.
(early_ra::try_to_chain_allocnos): Use is_chain_candidate in
correctness mode rather than trying to inline the test.
gcc/testsuite/
PR target/113295
* gcc.target/aarch64/pr113295-2.c: New test.
416.gamess showed up two wrong-code bugs in early-ra. This patch
fixes the first of them. It was difficult to reduce the source code
to something that would meaningfully show the situation, so the
testcase uses a direct RTL sequence instead.
In the sequence:
(a) register <2> is set more than once
(b) register <2> is copied to a temporary (<4>)
(c) register <2> is the destination of an FCSEL between <4> and
another value (<5>)
(d) <4> and <2> are equivalent for <4>'s live range
(e) <5>'s and <2>'s live ranges do not intersect, and there is
a pseudo-copy between <5> and <2>
On its own, (d) implies that <4> can be treated as equivalent to <2>.
And on its own, (e) implies that <5> can share <2>'s register. But
<4>'s and <5>'s live ranges conflict, meaning that they cannot both
share the register together. A bit of missing bookkeeping meant that
the mechanism for detecting this didn't fire. We therefore ended up
with an FCSEL in which both inputs were the same register.
gcc/
PR target/113295
* config/aarch64/aarch64-early-ra.cc
(early_ra::find_related_start): Account for definitions by shared
registers when testing for a single register definition.
(early_ra::accumulate_defs): New function.
(early_ra::record_copy): If A shares B's register, fold A's
definition information into B's. Fold A's use information into B's.
gcc/testsuite/
PR target/113295
* gcc.dg/rtl/aarch64/pr113295-1.c: New test.
H.J. Lu [Sun, 4 Feb 2024 15:46:35 +0000 (07:46 -0800)]
x86-64: Check R_X86_64_CODE_6_GOTTPOFF support
If assembler and linker supports
add %reg1, name@gottpoff(%rip), %reg2
with R_X86_64_CODE_6_GOTTPOFF, we can generate it instead of
mov name@gottpoff(%rip), %reg2
add %reg1, %reg2
gcc/
* configure.ac (HAVE_AS_R_X86_64_CODE_6_GOTTPOFF): Defined as 1
if R_X86_64_CODE_6_GOTTPOFF is supported.
* config.in: Regenerated.
* configure: Likewise.
* config/i386/predicates.md (apx_ndd_add_memory_operand): Allow
UNSPEC_GOTNTPOFF if R_X86_64_CODE_6_GOTTPOFF is supported.
gcc/testsuite/
* gcc.target/i386/apx-ndd-tls-1b.c: New test.
* lib/target-supports.exp
(check_effective_target_code_6_gottpoff_reloc): New.
Richard Earnshaw [Thu, 22 Feb 2024 16:47:20 +0000 (16:47 +0000)]
arm: fix ICE with vectorized reciprocal division [PR108120]
The expand pattern for reciprocal division was enabled for all math
optimization modes, but the patterns it was generating were not
enabled unless -funsafe-math-optimizations were enabled, this leads to
an ICE when the pattern we generate cannot be recognized.
Fixed by only enabling vector division when doing unsafe math.
gcc:
PR target/108120
* config/arm/neon.md (div<VCVTF:mode>3): Rename from div<mode>3.
Gate with ARM_HAVE_NEON_<MODE>_ARITH.
gcc/testsuite:
PR target/108120
* gcc.target/arm/neon-recip-div-1.c: New file.
Jakub Jelinek [Fri, 23 Feb 2024 10:38:18 +0000 (11:38 +0100)]
expr: Fix REDUCE_BIT_FIELD in multiplication expansion [PR114054]
The following testcase ICEs, because the REDUCE_BIT_FIELD macro uses
the target variable implicitly:
#define REDUCE_BIT_FIELD(expr) (reduce_bit_field \
? reduce_to_bit_field_precision ((expr), \
target, \
type) \
: (expr))
and so when the code below reuses the target variable, documented to be
The value may be stored in TARGET if TARGET is nonzero.
TARGET is just a suggestion; callers must assume that
the rtx returned may not be the same as TARGET.
for something unrelated (the value that should be returned), this misbehaves
(in the testcase target is set to a CONST_INT, which has VOIDmode and
reduce_to_bit_field_precision assert checking doesn't like that).
Needed to say that
If TARGET is CONST0_RTX, it means that the value will be ignored.
but in expand_expr_real_2 does at the start:
ignore = (target == const0_rtx
|| ((CONVERT_EXPR_CODE_P (code)
|| code == COND_EXPR || code == VIEW_CONVERT_EXPR)
&& TREE_CODE (type) == VOID_TYPE));
/* We should be called only if we need the result. */
gcc_assert (!ignore);
- so such target is mainly meant for calls and the like in other routines.
Certainly doesn't expect that target changes from not being ignored
initially to ignore later on and other CONST_INT results as well as anything
which is not an object into which anything can be stored.
So, the following patch fixes that by using a more appripriate temporary
for the result, which other code is using.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114054
* expr.cc (expand_expr_real_2) <case MULT_EXPR>: Use
temp variable instead of target parameter for result.
The following testcases show 2 bugs in the .{ADD,SUB}_OVERFLOW lowering,
both related to storing of the REALPART_EXPR part of the result.
On the first testcase prec is 255, prec_limbs is 4 and for the second limb
in the loop the store of the REALPART_EXPR of .USUBC (_30) is stored through:
if (_27 <= 3)
goto <bb 12>; [80.00%]
else
goto <bb 15>; [20.00%]
<bb 15> [local count: 1073741824]:
The first check is right, as prec_limbs is 4, we don't want to store
bitint.3[4] or above at all, those limbs are just computed for the overflow
checking and nothing else, so _27 > 4 leads to no store.
But the other condition is exact opposite of what should be done, if
the current index of the second limb (_27) is < 3, then it should
bitint.3[_27] = _30;
and if it is == 3, it should
MEM[(unsigned long *)&bitint.3 + 24B] = _30;
and (especially important for the targets which would bitinfo.extended = 1)
should actually in this case zero extend it from the 63 bits to 64, that is
the handling of the partial limb. The if_then_if_then_else helper if
there are 2 conditions sets m_gsi to be at the start of the
edge_true_false->dest bb, i.e. when the first condition is true and second
false, and that is where we store the SSA_NAME indexed limb store, so the
condition needs to be reversed.
The following patch does that and adds the cast as well, the usual
assumption that already handle_operand has the partial limb type doesn't
have to be true here, because the source operand could have much larger
precision than the REALPART_EXPR of the lhs.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114040
* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
Use EQ_EXPR rather than LT_EXPR for g2 condition and change its
probability from likely to unlikely. When handling the true true
store, first cast to limb_access_type and then to l's type.
* gcc.dg/torture/bitint-60.c: New test.
* gcc.dg/torture/bitint-61.c: New test.
Xi Ruoyao [Wed, 21 Feb 2024 15:54:53 +0000 (23:54 +0800)]
LoongArch: Don't falsely claim gold supported in toplevel configure
The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).
Richard Biener [Fri, 23 Feb 2024 07:59:12 +0000 (08:59 +0100)]
Add ia64*-*-* to the list of obsolete targets
The following deprecates ia64*-*-* for GCC 14. Since we plan to
force LRA for GCC 15 and the target only has slim chances of getting
updated this notifies people in advance. Given both Linux and
glibc have axed the target further development is also made difficult.
There is no listed maintainer for ia64 either.
PR target/90785
gcc/
* config.gcc: Add ia64*-*-* to the list of obsoleted targets.
contrib/
* config-list.mk (LIST): --enable-obsolete for ia64*-*-*.
Palmer Dabbelt [Fri, 9 Feb 2024 16:53:24 +0000 (08:53 -0800)]
RISC-V: Point our Python scripts at python3
This builds for me, and I frequently have python-is-python3 type
packages installed so I think I've been implicitly testing it for a
while. Looks like Kito's tested similar configurations, and the
bugzilla indicates we should be moving over.
gcc/ChangeLog:
PR other/109668
* config/riscv/arch-canonicalize: Move to python3
* config/riscv/multilib-generator: Likewise
Palmer Dabbelt [Tue, 20 Feb 2024 15:45:38 +0000 (07:45 -0800)]
doc: RISC-V: Document that -mcpu doesn't override -march or -mtune
This came up recently as Edwin was looking through the test suite. A
few of us were talking about this during the patchwork meeting and were
surprised. Looks like this is the desired behavior, so let's at least
document it.
Lulu Cheng [Wed, 21 Feb 2024 03:17:14 +0000 (11:17 +0800)]
LoongArch: When checking whether the assembler supports conditional branch relaxation, add compilation parameter "--fatal-warnings" to the assembler.
In binutils 2.40 and earlier versions, only a warning will be reported
when a relocation immediate value is out of bounds. As a result,
the value of the macro HAVE_AS_COND_BRANCH_RELAXATION will also be
defined as 1 when the assembler does not support conditional branch
relaxation. Therefore, add the compilation option "--fatal-warnings"
to avoid this problem.
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Add parameter "--fatal-warnings" to assemble
when checking whether the assemble support conditional branch
relaxation.
Jakub Jelinek [Thu, 22 Feb 2024 18:32:02 +0000 (19:32 +0100)]
c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc. modes [PR114007]
We aren't able to parse __has_attribute (vendor::attr) (and __has_c_attribute
and __has_cpp_attribute) in strict C < C23 modes. While in -std=gnu* modes
or in -std=c23 there is CPP_SCOPE token, in -std=c* (except for -std=c23)
there are is just a pair of CPP_COLON tokens.
The c-lex.cc hunk adds support for that.
That leads to a question if we should return 1 or 0 from
__has_attribute (gnu::unused) or not, because while
[[gnu::unused]] is parsed fine in -std=gnu*/-std=c23 modes (sure, with
pedwarn for < C23), we do not parse it at all in -std=c* (except for
-std=c23), we only parse [[__extension__ gnu::unused]] there. While
the __extension__ in there helps to avoid the pedwarn, I think it is
better to be consistent between GNU and strict C < C23 modes and
parse [[gnu::unused]] too; on the other side, I think parsing
[[__extension__ gnu : : unused]] is too weird and undesirable.
So, the following patch adds a flag during preprocessing at the point
where we normally create CPP_SCOPE tokens out of 2 consecutive colons
on the first CPP_COLON to mark the consecutive case (as we are tight
on the bits, I've reused the PURE_ZERO flag, which is used just by the
C++ FE and only ever set (both C and C++) on CPP_NUMBER tokens, this
new flag has the same value and is only ever used on CPP_COLON tokens)
and instead of checking loose_scope_p argument (i.e. whether it is
[[__extension__ ...]] or not), it just parses CPP_SCOPE or CPP_COLON
with CLONE_SCOPE flag followed by another CPP_COLON the same.
The latter will never appear in >= C23 or -std=gnu* modes, though
guarding its use say with flag_iso && !flag_isoc23 && doesn't really
work because the __extension__ case temporarily clears flag_iso flag.
This makes the -std=c11 etc. behavior more similar to -std=gnu11 or
-std=c23, the only difference I'm aware of are the
#define JOIN2(A, B) A##B
[[vendor JOIN2(:,:) attr]]
[[__extension__ vendor JOIN2(:,:) attr]]
cases, which are accepted in the latter modes, but results in error
in -std=c11; but the error is during preprocessing that :: doesn't
form a valid preprocessing token, which is true, so just don't do that if
you try to have __STRICT_ANSI__ && __STDC_VERSION__ <= 201710L
compatibility.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR c/114007
gcc/
* doc/extend.texi: (__extension__): Remove comments about scope
tokens vs. two colons.
gcc/c-family/
* c-lex.cc (c_common_has_attribute): Parse 2 CPP_COLONs with
the first one with COLON_SCOPE flag the same as CPP_SCOPE.
gcc/c/
* c-parser.cc (c_parser_std_attribute): Remove loose_scope_p argument.
Instead of checking it, parse 2 CPP_COLONs with the first one with
COLON_SCOPE flag the same as CPP_SCOPE.
(c_parser_std_attribute_list): Remove loose_scope_p argument, don't
pass it to c_parser_std_attribute.
(c_parser_std_attribute_specifier): Adjust c_parser_std_attribute_list
caller.
gcc/testsuite/
* gcc.dg/c23-attr-syntax-6.c: Adjust testcase for :: being valid
even in -std=c11 even without __extension__ and : : etc. not being
valid anymore even with __extension__.
* gcc.dg/c23-attr-syntax-7.c: Likewise.
* gcc.dg/c23-attr-syntax-8.c: New test.
libcpp/
* include/cpplib.h (COLON_SCOPE): Define to PURE_ZERO.
* lex.cc (_cpp_lex_direct): When lexing CPP_COLON with another
colon after it, if !CPP_OPTION (pfile, scope) set COLON_SCOPE
flag on the first CPP_COLON token.
Andrew Pinski [Thu, 22 Feb 2024 04:12:21 +0000 (20:12 -0800)]
warn-access: Fix handling of unnamed types [PR109804]
This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
the code expected newc.u.s_binary.left would be valid.
So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function paramaters
(DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters (DEMANGLE_COMPONENT_TEMPLATE_PARAM).
Note the code in the demangler does this when it sets DEMANGLE_COMPONENT_UNNAMED_TYPE:
ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
ret->u.s_number.number = num;
Committed as obvious after bootstrap/test on x86_64-linux-gnu
This is because the non-Q variant for indices 0 and 1 are just shuffling values.
There is no perf difference between INS SIMD to SIMD and ZIP on Arm uArches but
preferring the INS alternative has a drawback on all uArches as ZIP being a three
operand instruction can be used to tie the result to the return register whereas
INS would require an fmov.
As such just update the test file for now.
gcc/testsuite/ChangeLog:
PR target/112375
* gcc.target/aarch64/vget_set_lane_1.c: Update test output.
Gaius Mulley [Thu, 22 Feb 2024 15:02:19 +0000 (15:02 +0000)]
PR modula2/114055 improve error message when checking the BY constant
The fix marks a constant created during the default BY clause of the
FOR loop as internal. The type checker will always return true if
checking against an internal const.
gcc/m2/ChangeLog:
PR modula2/114055
* gm2-compiler/M2Check.mod (Import): IsConstLitInternal and
IsConstLit.
(isInternal): New procedure function.
(doCheck): Test for isInternal in either operand and early
return true.
* gm2-compiler/M2Quads.mod (PushOne): Rewrite with extra
parameter internal.
(BuildPseudoBy): Add TRUE parameter to PushOne call.
(BuildIncProcedure): Add FALSE parameter to PushOne call.
(BuildDecProcedure): Add FALSE parameter to PushOne call.
* gm2-compiler/M2Range.mod (ForLoopBeginTypeCompatible):
Uncomment code and tidy up error string.
* gm2-compiler/SymbolTable.def (PutConstLitInternal):
New procedure.
(IsConstLitInternal): New procedure function.
* gm2-compiler/SymbolTable.mod (PutConstLitInternal):
New procedure.
(IsConstLitInternal): New procedure function.
(SymConstLit): New field IsInternal.
(CreateConstLit): Initialize IsInternal to FALSE.
gcc/testsuite/ChangeLog:
PR modula2/114055
* gm2/pim/fail/forloopby.mod: New test.
* gm2/pim/pass/forloopby2.mod: New test.
When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.
PR tree-optimization/114027
* tree-vect-loop.cc (vecctorizable_reduction): Use optimized
condition reduction classification only for single-element
chains.
Jakub Jelinek [Thu, 22 Feb 2024 12:07:25 +0000 (13:07 +0100)]
profile-count: Don't dump through a temporary buffer [PR111960]
The profile_count::dump (char *, struct function * = NULL) const;
method has a single caller, the
profile_count::dump (FILE *f, struct function *fun) const;
method and for that going through a temporary buffer is just slower
and opens doors for buffer overflows, which is exactly why this P1
was filed.
The buffer size is 64 bytes, the previous maximum
"%" PRId64 " (%s)"
would print up to 61 bytes in there (19 bytes for arbitrary uint64_t:61
bitfield printed as signed, "estimated locally, globally 0 adjusted"
i.e. 38 bytes longest %s and 4 other characters).
Now, after the r14-2389 changes, it can be
19 + 38 plus 11 other characters + %.4f, which is worst case
309 chars before decimal point, decimal point and 4 digits after it,
so total 382 bytes.
So, either we could bump the buffer[64] to buffer[400], or the following
patch just drops the indirection through buffer and prints it directly to
stream. After all, having APIs which fill in some buffer without passing
down the size of the buffer is just asking for buffer overflows over time.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR ipa/111960
* profile-count.h (profile_count::dump): Remove overload with
char * first argument.
* profile-count.cc (profile_count::dump): Change overload with char *
first argument which uses sprintf into the overfload with FILE *
first argument and use fprintf instead. Remove overload which wrapped
it.
Jakub Jelinek [Thu, 22 Feb 2024 09:19:15 +0000 (10:19 +0100)]
call-cdce: Add missing BUILT_IN_*F{32,64}X handling and improve BUILT_IN_*L [PR113993]
The following testcase ICEs, because can_test_argument_range
returns true for BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}
among many other builtins, but get_no_error_domain doesn't handle
those.
float32x_type_node when supported in GCC always has DFmode, so that
case is easy (and call-cdce assumes that SFmode is IEEE float and DFmode
is IEEE double). So *F32X is simply handled by adding those cases
next to *F64.
float64x_type_node when supported in GCC by definition has a mode
with larger precision and exponent range than DFmode, so it can be XFmode,
TFmode or KFmode. I went through all the l/f128 suffixed builtins and
verified that the float128x_type_node no error domain range is actually
identical to the Intel extended long double no error domain range; it isn't
that surprising, both IEEE quad and Intel/Motorola extended have the same
exponent range [-16381, 16384] (well, Motorola -16382 probably because of
different behavior for denormals, but that has nothing to do with
get_no_error_domain which is about large inputs overflowing into +-Inf
or triggering NaN, denormals could in theory do something solely for sqrt
and even that is fine). In theory some target could have different larger
type, so for *F64X the code verifies that
REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384
and if so, uses the *F128 domains, otherwise falls back to the non-suffixed
ones (aka *F64), that is certainly the conservative minimum.
While at it, the patch also changes the *L suffixed cases to do pretty much
the same, the comment said that the function just assumes for *L
the *F64 ranges, but that is unnecessarily conservative.
All we currently have for long double is:
1) IEEE quad (emax 16384, *F128 ranges)
2) XFmode Intel/Motorola extended (emax 16384, same as *F128 ranges)
3) IBM extended (double double, emax 1024, the extra precision doesn't
really help and the domains are the same as for *F64)
4) same as double (*F64 again)
So, the patch uses also for *L
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
checks and either tail recurses into the *F128 case for that or to
non-suffixed (aka *F64) case otherwise.
BUILT_IN_*F128X not handled because no target has those and it doesn't
seem something is on the horizon and who knows what would be used for that.
Thus, all we get this wrong for are probably VAX floats or something
similar, no intent from me to look at that, that is preexisting issue.
BTW, I'm surprised we don't have BUILT_IN_EXP10F{16,32,64,128,32X,64X,128X}
builtins, seems glibc has those (sure, I think except *16 and *128x).
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113993
* tree-call-cdce.cc (get_no_error_domain): Handle
BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}. Handle
BUILT_IN_{COSH,SINH,EXP{,M1,2}}L for
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
the as the F128 suffixed cases, otherwise as non-suffixed ones.
Handle BUILT_IN_{EXP,POW}10L for
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
as (-inf, 4932).
Currently, bitint_large_huge::lower_mul_overflow uses cnt 1 only if
startlimb == endlimb and in that case doesn't use a loop and handles
everything in a special if:
unsigned cnt;
bool use_loop = false;
if (startlimb == endlimb)
cnt = 1;
else if (startlimb + 1 == endlimb)
cnt = 2;
else if ((end % limb_prec) == 0)
{
cnt = 2;
use_loop = true;
}
else
{
cnt = 3;
use_loop = startlimb + 2 < endlimb;
}
if (cnt == 1)
{
...
}
else
The loop handling for the loop exit condition wants to compare if the
incremented index is equal to endlimb, but that is correct only if
end is not divisible by limb_prec and there will be a straight line
check after the loop as well for the most significant limb. The code
used endlimb + (cnt == 1) for that, but cnt == 1 is never true here,
because cnt is either 2 or 3, so the right check is (cnt == 2).
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114038
* gimple-lower-bitint.cc (bitint_large_huge::lower_mul_overflow): Fix
loop exit condition if end is divisible by limb_prec.
YunQiang Su [Thu, 22 Feb 2024 05:05:06 +0000 (13:05 +0800)]
invoke.texi: Fix some skipping UrlSuffix problem for MIPS
The problem is that, there are these lines in mips.opt.urls:
; skipping UrlSuffix for 'mabi=' due to finding no URLs
; skipping UrlSuffix for 'mno-flush-func' due to finding no URLs
; skipping UrlSuffix for 'mexplicit-relocs' due to finding no URLs
These lines is not fixed by this patch due to that we don't
document these options:
; skipping UrlSuffix for 'mlra' due to finding no URLs
; skipping UrlSuffix for 'mdebug' due to finding no URLs
; skipping UrlSuffix for 'meb' due to finding no URLs
; skipping UrlSuffix for 'mel' due to finding no URLs
gcc
* doc/invoke.texi(MIPS Options): Fix skipping UrlSuffix
problem of mabi=, mno-flush-func, mexplicit-relocs;
add missing leading - of mbranch-cost option.
* config/mips/mips.opt.urls: Regenerate.
As PR109987 and its duplicated bugs show, -mno-power8-vector
(and -mno-power9-vector) cause some problems and as Segher
pointed out in [1] they are workaround options, so this patch
is to remove -m{no,}-power{8,9}-options. Like what we did
for option -mdirect-move before, this patch still keep the
corresponding internal flags and they are automatically set
based on -mcpu. The test suite update takes some efforts,
it consists of some aspects:
- effective target powerpc_p{8,9}vector_ok are removed
and replaced with powerpc_vsx_ok.
- Some cases having -mpower{8,9}-vector are updated with
-mvsx, some of them already have -mdejagnu-cpu. For
those that don't have -mdejagnu-cpu, if -mdejagnu-cpu
is needed for the test point, then it's appended;
otherwise, add additional-options -mdejagnu-cpu=power{8,9}
if has_arch_pwr{8,9} isn't satisfied.
- Some test cases are updated with explicit -mvsx.
- Some test cases with those two option mixed are adjusted
to keep the test points, like -mpower8-vector
-mno-power9-vector are updated with -mdejagnu-cpu=power8
-mvsx etc.
- Some test cases with -mno-power{8,9}-vector are updated
by replacing -mno-power{8,9}-vector with -mno-vsx, or
just removing it.
- For some cases, we don't always specify -mdejagnu-cpu to
avoid to restrict the testing coverage, it would check
has_arch_pwr{8,9} and appended that as need.
- For vect test cases run, it doesn't specify -mcpu=power9
for power10 and up.
Bootstrapped and regtested on:
- powerpc64-linux-gnu P7/P8/P9 {-m32,-m64}
- powerpc64le-linux-gnu P8/P9/P10
Although it's stage4 now, as the discussion in PR113115 we
are still eager to neuter these two options, so is it ok
for trunk?
* config/rs6000/constraints.md (we): Update internal doc without
referring to option -mpower9-vector.
* config/rs6000/driver-rs6000.cc (asm_names): Remove mpower9-vector
special handlings.
* config/rs6000/rs6000-cpus.def (OTHER_P9_VECTOR_MASKS,
OTHER_P8_VECTOR_MASKS): Merge to ...
(OTHER_VSX_VECTOR_MASKS): ... here.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
some error message handlings and explicit option mask adjustments on
explicit option power{8,9}-vector conflicting with other options.
(rs6000_print_isa_options): Update comments.
(rs6000_disable_incompatible_switches): Remove power{8,9}-vector
related array items and handlings.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Remove mpower9-vector
special handlings.
* config/rs6000/rs6000.opt: Make option power{8,9}-vector as
WarnRemoved.
* doc/extend.texi: Remove documentation referring to option
-mpower8-vector.
* doc/invoke.texi: Remove documentation for option
-mpower{8,9}-vector and adjust some documentation referring to them.
* doc/md.texi: Update documentation for constraint we.
* doc/sourcebuild.texi: Remove documentation for powerpc_p8vector_ok.
libgcc/ChangeLog:
* config/rs6000/t-float128-hw: Replace options -mpower{8,9}-vector
with -mcpu=power9.
* configure.ac: Update use of option -mpower9-vector with
-mcpu=power9.
* configure: Regenerate.
Fangrui Song [Wed, 31 Jan 2024 04:41:12 +0000 (20:41 -0800)]
RISC-V: Add tests for constraints "i" and "s"
The constraints "i" and "s" can be used with a symbol that binds
externally, e.g.
```
namespace ns { extern int var, a[4]; }
void foo() {
asm(".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "s"(&ns::var));
asm(".reloc ., BFD_RELOC_NONE, %0" :: "s"(&ns::a[3]));
}
```