i386: Use offsetable address constraint for double-word memory operands, part 2
Double-word memory operands are accessed as their high and low part, so the
memory location has to be offsettable. Use "o" constraint instead of "m"
for double-word memory operands.
gcc/ChangeLog:
* config/i386/i386.md (*insvti_lowpart_1): Use "o" constraint
instead of "m" for double-word mode memory operands.
Marek Polacek [Tue, 3 Sep 2024 21:01:48 +0000 (17:01 -0400)]
c++: ICE with TTP [PR96097]
We crash when dependent_type_p gets a TEMPLATE_TYPE_PARM outside
a template. That happens here because in
template <template <typename T, typename T::type TT> typename X>
void func() {}
template <typename U, int I>
struct Y {};
void g() { func<Y>(); }
when performing overload resolution for func<Y>() we have to check
if U matches T and I matches TT. So we wind up in
coerce_template_template_parm/PARM_DECL. TREE_TYPE (arg) is int
so we try to substitute TT's type, which is T::type. But we have
nothing to substitute T with. And we call make_typename_type where
ctx is still T, which checks dependent_scope_p and we trip the assert.
It should work to always perform the substitution in a template context.
If the result still contains template parameters, we cannot say if they
match.
PR c++/96097
gcc/cp/ChangeLog:
* pt.cc (coerce_template_template_parm): Increment
processing_template_decl before calling tsubst.
In s390_expand_insv(), if generating code for ICM et al. src is a MEM
and gen_lowpart might force src into a register such that we end up with
patterns which do not match anymore. Use adjust_address() instead in
order to preserve a MEM.
Furthermore, it is not straight forward to enforce a subreg. For
example, in case of a paradoxical subreg, gen_lowpart() may return a
register. In order to compensate this, s390_gen_lowpart_subreg() emits
a reference to a pseudo which does not coincide with its definition
which is wrong. Additionally, if dest is a paradoxical subreg, then do
not try to emit a strict_low_part since it could mean that dest was not
initialized even though this might be fixed up later by init-regs.
Splitter for insn *get_tp_64, *zero_extendhisi2_31,
*zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
just returns the hard register for mode m which is fine to use as an
argument for strict_low_part, i.e., we do not need to enforce subregs
here since after reload subregs are supposed to be eliminated anyway.
This fixes gcc.dg/torture/pr111821.c.
gcc/ChangeLog:
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
* config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
(s390_expand_insv): Use adjust_address() and emit a
strict_low_part only in case of a natural subreg.
* config/s390/s390.md: Use gen_lowpart() instead of
s390_gen_lowpart_subreg().
Richard Biener [Thu, 12 Sep 2024 09:31:59 +0000 (11:31 +0200)]
Abort loop SLP analysis quicker
As we can't cope with removed SLP instances during analysis there's
no point in doing that or even continuing analysis of SLP instances
after a failure. The following makes us abort early.
* tree-vect-slp.cc (vect_slp_analyze_operations): When
doing loop analysis fail after the first failed SLP
instance. Only remove instances when doing BB vectorization.
* tree-vect-loop.cc (vect_analyze_loop_2): Check whether
vect_slp_analyze_operations failed instead of checking
the number of SLP instances remaining.
Jakub Jelinek [Thu, 12 Sep 2024 09:34:06 +0000 (11:34 +0200)]
libcpp: Add support for gnu::offset #embed/__has_embed parameter
The following patch adds on top of the just posted #embed patch
a first extension, gnu::offset which allows to seek in the data
file (for seekable files, otherwise read and throw away).
I think this is useful e.g. when some binary data start with
some well known header which shouldn't be included in the data etc.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
libcpp/
* internal.h (struct cpp_embed_params): Add offset member.
* directives.cc (EMBED_PARAMS): Add gnu::offset entry.
(enum embed_param_kind): Add NUM_EMBED_STD_PARAMS.
(_cpp_parse_embed_params): Use NUM_EMBED_STD_PARAMS rather than
NUM_EMBED_PARAMS when parsing standard parameters. Parse gnu::offset
parameter.
* files.cc (struct _cpp_file): Add offset member.
(_cpp_stack_embed): Handle params->offset.
gcc/
* doc/cpp.texi (Binary Resource Inclusion): Document gnu::offset
#embed parameter.
gcc/testsuite/
* c-c++-common/cpp/embed-15.c: New test.
* c-c++-common/cpp/embed-16.c: New test.
* gcc.dg/cpp/embed-5.c: New test.
Jakub Jelinek [Thu, 12 Sep 2024 09:15:38 +0000 (11:15 +0200)]
libcpp, c-family: Add (dumb) C23 N3017 #embed support [PR105863]
The following patch implements the C23 N3017 "#embed - a scannable,
tooling-friendly binary resource inclusion mechanism" paper.
The implementation is intentionally dumb, in that it doesn't significantly
speed up compilation of larger initializers and doesn't make it possible
to use huge #embeds (like several gigabytes large, that is compile time
and memory still infeasible).
There are 2 reasons for this. One is that I think like it is implemented
now in the patch is how we should use it for the smaller #embed sizes,
dunno with which boundary, whether 32 bytes or 64 or something like that,
certainly handling the single byte cases which is something that can appear
anywhere in the source where constant integer literal can appear is
desirable and I think for a few bytes it isn't worth it to come up with
something smarter and users would like to e.g. see it in -E readably as
well (perhaps the slow vs. fast boundary should be determined by command
line option). And the other one is to be able to more easily find
regressions in behavior caused by the optimizations, so we have something
to get back in git to compare against.
I'm definitely willing to work on the optimizations (likely introduce a new
CPP_* token type to refer to a range of libcpp owned memory (start + size)
and similarly some tree which can do the same, and can be at any time e.g.
split into 2 subparts + say INTEGER_CST in between if needed say for
const unsigned char d[] = {
#embed "2GB.dat" prefix (0, 0, ) suffix (, [0x40000000] = 42)
}; still without having to copy around huge amounts of data; STRING_CST
owns the memory it points to and can be only 2GB in size), but would
like to do that incrementally.
And would like to first include some extensions also not included in
this patch, like gnu::offset (off) parameter to allow to skip certain
constant amount of bytes at the start of the files, plus
gnu::base64 ("base64_encoded_data") parameter to add something which can
store more efficiently large amounts of the #embed data in preprocessed
source.
I've been cross-checking all the tests also against the LLVM implementation
https://github.com/llvm/llvm-project/pull/68620
which has been for a few hours even committed to LLVM trunk but reverted
afterwards. LLVM now has the support committed and I admit I haven't
rechecked whether the behavior on the below mentioned spots have been fixed
in it already or not yet.
The patch uses --embed-dir= option that clang plans to add above and doesn't
use other variants on the search directories yet, plus there are no
default directories at least for the time being where to search for embed
files. So, #embed "..." works if it is found in the same directory (or
relative to the current file's directory) and #embed "/..." or #embed </...>
work always, but relative #embed <...> doesn't unless at least one
--embed-dir= is specified. There is no reason to differentiate between
system and non-system directories, so we don't need -isystem like
counterpart, perhaps -iquote like counterpart could be useful in the future,
dunno what else. It has --embed-directory=dir and --embed-directory dir
as aliases.
There are some differences beyond clang ICEs, so I'd like to point them out
to make sure there is agreement on the choices in the patch. They are also
mentioned in the comments of the llvm pull request.
The most important is that the GCC patch (as well as the original thephd.dev
LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into
a mere sequence of numbers like 123,2,35,26 rather then what clang
effectively treats as (unsigned char)123,(unsigned char)2,(unsigned
char)35,(unsigned char)26 but only does that when using integrated
preprocessor, not when using -save-temps where it acts as GCC.
JeanHeyd as the original author agrees that is how it is currently worded in
C23.
Another difference (not tested in the testsuite, not sure how to check for
effective target /dev/urandom nor am sure it is desirable to check that
during testsuite) is how to treat character devices, named pipes etc.
(block devices are errored on). The original paper uses /dev/urandom
in various examples and seems to assume that unlike regular files the
devices aren't really cached, so
#embed </dev/urandom> limit(1) prefix(int a = ) suffix(;)
#embed </dev/urandom> limit(1) prefix(int b = ) suffix(;)
usually results in a != b. That is what the godbolt thephd.dev branch
implements too and what this patch does as well, but clang actually seems
to just go from st.st_size == 0, ergo it must be zero-sized resource and
so just copies over if_empty if present. It is really questionable
what to do about the character devices/named pipes with __has_embed, for
regular files the patch doesn't read anything from them, relies on
st.st_size + limit for whether it is empty or non-empty. But I don't know
of a way to check if read on say a character device would read anything
or not (the </dev/null> limit (1) vs. </dev/zero> limit (1) cases), and
if we read something, that would be better cached for later because
#embed later if it reads again could read no further data even when it
first read something. So, the patch currently for __has_embed just
always returns 2 on the non-regular files, like the thephd.dev
branch does as well and like the clang pull request as well.
A question is also what to do for gnu::offset on the non-regular files
even for #embed, those aren't seekable and do we want to just read and throw
away the offset bytes each time we see it used?
clang also chokes on the
#if __has_embed (__FILE__ __limit__ (1) __prefix__ () suffix (1 / 0) \
__if_empty__ ((({{[0[0{0{0(0(0)1)1}1}]]}})))) != __STDC_EMBED_FOUND__
#error "__has_embed fail"
#endif
in embed-1.c, but thephd.dev branch accepts it and I don't see why
it shouldn't, (({{[0[0{0{0(0(0)1)1}1}]]}}))) is a balanced token
sequence and the file isn't empty, so it should just be parsed and
discarded.
clang also IMHO mishandles
const unsigned char w[] = {
#embed __FILE__ prefix([0] = 42, [15] =) limit(32)
};
but again only without -save-temps, seems like it
treats it as
[0] = 42, [15] = (99,111,110,115,116,32,117,110,115,105,103,110,101,100,
32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98)
rather than
[0] = 42, [15] = 99,111,110,115,116,32,117,110,115,105,103,110,101,100,
32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98
and warns on it for -Wunused-value and just compiles it as
[0] = 42, [15] = 98
And also
void foo (int, int, int, int);
void bar (void) { foo (
#embed __FILE__ limit (4) prefix (172 + ) suffix (+ 2)
); }
is treated as
172 + (118, 111, 105, 100) + 2
rather than
172 + 118, 111, 105, 100 + 2
which clang -save-temps or GCC treats it like, so results
in just one argument passed rather than 4.
if (!strstr ((const char *) magna_carta, "imprisonétur")) abort ();
in the testcase fails as well, but in that case calling it in gdb succeeds:
p ((char *(*)(char *, char *))__strstr_sse2) (magna_carta, "imprisonétur")
$2 = 0x555555558d3c <magna_carta+11564> "imprisonétur aut disseisiátur"...
so I guess they are just trying to constant evaluate strstr and do it
incorrectly.
They started with making the optimizations together in the initial patch
set, so they don't have the luxury to compare if it is just because of
the optimization they are trying to do or because that is how the
feature works for them. At least unless they use -save-temps for now.
There is also different behavior between clang and gcc on -M or other
dependency generating options. Seems clang includes the __has_embed
searched files in dependencies, while my patch doesn't. But so does
clang for __has_include and GCC doesn't. Emitting a hard dependency
on some header just because there was __has_include/__has_embed for it
seems wrong to me, because (at least when properly written) the source
likely doesn't mind if the file is missing, it will do something else,
so a hard error from make because of it doesn't seem right. Does
make have some weaker dependencies, such that if some file can be remade
it is but if it doesn't exist, it isn't fatal?
I wonder whether #embed <non-existent-file> really needs to be fatal
or whether we could simply after diagnosing it pretend the file exists
and is empty. For #include I think fatal errors make tons of sense,
but perhaps for #embed which is more localized we'd get better error
reporting if we didn't bail out immediately. Note, both GCC and clang
currently treat those as fatal errors.
clang also added -dE option which with -E instead of preprocessing
the #embed directives keeps them as is, but the preprocessed source
then isn't self-contained. That option looks more harmful than useful to
me.
Also, it isn't clear to me from C23 whether it is possible to have
__has_include/__has_c_attribute/__has_embed expressions inside of
the limit #embed/__has_embed argument.
6.10.3.2/2 says that defined should not appear there (and the patch
diagnoses it and testsuite tests), but for __has_include/__has_embed
etc. 6.10.1/11 says:
"The identifiers __has_include, __has_embed, and __has_c_attribute
shall not appear in any context not mentioned in this subclause."
If that subclause in that case means 6.10.1, then it presumably shouldn't
appear in #embed in 6.10.3, but __has_embed is in 6.10.1...
But 6.10.3.2/3 says that it should be parsed according to the 6.10.1
rules. Haven't included tests like
#if __has_embed (__FILE__ limit (__has_embed (__FILE__ limit (1))))
or
#embed __FILE__ limit (__has_include (__FILE__))
into the testsuite because of the doubts but I think the patch should
handle those right now.
The reason I've used Magna Carta text in some of the testcases is that
I hope it shouldn't be copyrighted after the centuries and I'd strongly
prefer not to have binary blobs in git after the xz backdoor lesson
and wanted something larger which doesn't change all the time.
Oh, BTW, I see in C23 draft 6.10.3.2 in Example 4
if (f_source == NULL);
return 1;
(note the spurious semicolon after closing paren), has that been fixed
already?
Like the thephd.dev and clang implementations, the patch always macro
expands the whole #embed and __has_embed directives except for the
embed keyword. That is most likely not what C23 says, my limited
understanding right now is that in #embed one needs to parse the whole
directive line with macro expansion disabled and check if it satisfies the
grammar, if not, the whole directive is macro expanded, if yes, only
the limit parameter argument is macro expanded and the prefix/suffix/if_empty
arguments are maybe macro expanded when actually used (and not at all if
unused). And I think __has_embed macro expansion has conflicting rules.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
PR c/105863
libcpp/
* include/cpplib.h: Implement C23 N3017 #embed - a scannable,
tooling-friendly binary resource inclusion mechanism paper.
(struct cpp_options): Add embed member.
(enum cpp_builtin_type): Add BT_HAS_EMBED.
(cpp_set_include_chains): Add another cpp_dir * argument to
the declaration.
* internal.h (enum include_type): Add IT_EMBED.
(struct cpp_reader): Add embed_include member.
(struct cpp_embed_params_tokens): New type.
(struct cpp_embed_params): New type.
(_cpp_get_token_no_padding): Declare.
(enum _cpp_find_file_kind): Add _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED.
(_cpp_stack_embed): Declare.
(_cpp_parse_expr): Change return type to cpp_num_part instead of
bool, change second argument from bool to const char * and add third
argument.
(_cpp_parse_embed_params): Declare.
* directives.cc (DIRECTIVE_TABLE): Add embed entry.
(end_directive): Don't call skip_rest_of_line for T_EMBED directive.
(_cpp_handle_directive): Return 2 rather than 1 for T_EMBED in
directives-only mode.
(parse_include): Don't Call check_eol for T_EMBED directive.
(skip_balanced_token_seq): New function.
(EMBED_PARAMS): Define.
(enum embed_param_kind): New type.
(embed_params): New variable.
(_cpp_parse_embed_params): New function.
(do_embed): New function.
(do_if): Adjust _cpp_parse_expr caller.
(do_elif): Likewise.
* expr.cc (parse_defined): Diagnose defined in #embed or __has_embed
parameters.
(_cpp_parse_expr): Change return type to cpp_num_part instead of
bool, change second argument from bool to const char * and add third
argument. Adjust function comment. For #embed/__has_embed parameters
add an artificial CPP_OPEN_PAREN. Use the second argument DIR
directly instead of string literals conditional on IS_IF.
For #embed/__has_embed parameter, stop on reaching CPP_CLOSE_PAREN
matching the artificial one. Diagnose negative or too large embed
parameter operands.
(num_binary_op): Use #embed instead of #if for diagnostics if inside
#embed/__has_embed parameter.
(num_div_op): Likewise.
* files.cc (struct _cpp_file): Add limit member and embed bitfield.
(search_cache): Add IS_EMBED argument, formatting fix. Skip over
files with different file->embed from the argument.
(find_file_in_dir): Don't call pch_open_file if file->embed.
(_cpp_find_file): Handle _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED.
(read_file_guts): Formatting fix.
(has_unique_contents): Ignore file->embed files.
(search_path_head): Handle IT_EMBED type.
(_cpp_stack_embed): New function.
(_cpp_get_file_stat): Formatting fix.
(cpp_set_include_chains): Add embed argument, save it to
pfile->embed_include and compute lens for the chain.
* init.cc (struct lang_flags): Add embed member.
(lang_defaults): Add embed initializers.
(cpp_set_lang): Initialize CPP_OPTION (pfile, embed).
(builtin_array): Add __has_embed entry.
(cpp_init_builtins): Predefine __STDC_EMBED_NOT_FOUND__,
__STDC_EMBED_FOUND__ and __STDC_EMBED_EMPTY__.
* lex.cc (cpp_directive_only_process): Handle #embed.
* macro.cc (cpp_get_token_no_padding): Rename to ...
(_cpp_get_token_no_padding): ... this. No longer static.
(builtin_has_include_1): New function.
(builtin_has_include): Use it. Use _cpp_get_token_no_padding
instead of cpp_get_token_no_padding.
(builtin_has_embed): New function.
(_cpp_builtin_macro_text): Handle BT_HAS_EMBED.
gcc/
* doc/cppdiropts.texi (--embed-dir=): Document.
* doc/cpp.texi (Binary Resource Inclusion): New chapter.
(__has_embed): Document.
* doc/invoke.texi (Directory Options): Mention --embed-dir=.
* gcc.cc (cpp_unique_options): Add %{-embed*}.
* genmatch.cc (main): Adjust cpp_set_include_chains caller.
* incpath.h (enum incpath_kind): Add INC_EMBED.
* incpath.cc (merge_include_chains): Handle INC_EMBED.
(register_include_chains): Adjust cpp_set_include_chains caller.
gcc/c-family/
* c.opt (-embed-dir=): New option.
(-embed-directory): New alias.
(-embed-directory=): New alias.
* c-opts.cc (c_common_handle_option): Handle OPT__embed_dir_.
gcc/testsuite/
* c-c++-common/cpp/embed-1.c: New test.
* c-c++-common/cpp/embed-2.c: New test.
* c-c++-common/cpp/embed-3.c: New test.
* c-c++-common/cpp/embed-4.c: New test.
* c-c++-common/cpp/embed-5.c: New test.
* c-c++-common/cpp/embed-6.c: New test.
* c-c++-common/cpp/embed-7.c: New test.
* c-c++-common/cpp/embed-8.c: New test.
* c-c++-common/cpp/embed-9.c: New test.
* c-c++-common/cpp/embed-10.c: New test.
* c-c++-common/cpp/embed-11.c: New test.
* c-c++-common/cpp/embed-12.c: New test.
* c-c++-common/cpp/embed-13.c: New test.
* c-c++-common/cpp/embed-14.c: New test.
* c-c++-common/cpp/embed-25.c: New test.
* c-c++-common/cpp/embed-26.c: New test.
* c-c++-common/cpp/embed-dir/embed-1.inc: New test.
* c-c++-common/cpp/embed-dir/embed-3.c: New test.
* c-c++-common/cpp/embed-dir/embed-4.c: New test.
* c-c++-common/cpp/embed-dir/magna-carta.txt: New test.
* gcc.dg/cpp/embed-1.c: New test.
* gcc.dg/cpp/embed-2.c: New test.
* gcc.dg/cpp/embed-3.c: New test.
* gcc.dg/cpp/embed-4.c: New test.
* g++.dg/cpp/embed-1.C: New test.
* g++.dg/cpp/embed-2.C: New test.
* g++.dg/cpp/embed-3.C: New test.
Simon Martin [Tue, 10 Sep 2024 20:33:18 +0000 (22:33 +0200)]
c++: Don't ICE to build private access error message [PR116323]
We currently ICE upon the following code while building the "[...] is
private within this context" error message
=== cut here ===
class A { enum Enum{}; };
template<typename E, template<typename> class Alloc>
class B : private Alloc<E>, private A {};
template<typename E, template<typename> class Alloc>
int B<E, Alloc>::foo (Enum m) { return 42; }
=== cut here ===
The problem is that since r11-6880, after detecting that Enum cannot be
accessed in B, enforce_access will access the TYPE_BINFO of all the
bases of B, which ICEs for any that is a BOUND_TEMPLATE_TEMPLATE_PARM.
This patch simply skips such bases.
PR c++/116323
gcc/cp/ChangeLog:
* search.cc (get_parent_with_private_access): Only call access_in_type
for RECORD_OR_UNION_TYPE_P base BINFOs.
Richard Biener [Wed, 11 Sep 2024 12:50:02 +0000 (14:50 +0200)]
Better recover from SLP reassociation fails during discovery
When we decide to not process a association chain of size two and
that would also mismatch with a different chain size on another lane
we shouldn't fail discovery hard at this point. Instead let the
regular discovery figure out matching lanes so the parent can
decide to perform operand swapping or we can split groups at better
points rather than forcefully splitting away the first single lane.
For example on gcc.dg/vect/vect-strided-u8-i8.c we now see two
groups of size 4 feeding the store instead of groups of size 1,
three, two, one and one.
* tree-vect-slp.cc (vect_build_slp_tree_2): On reassociation
chain length mismatch do not fail discovery of the node
but try without re-associating to compute a better matches[].
Provide a reassociation failure hint in the dump.
(vect_slp_analyze_node_operations): Avoid stray failure
dumping.
(vectorizable_slp_permutation_1): Dump the address of the
SLP node representing the permutation.
Bohan Lei [Thu, 12 Sep 2024 02:28:03 +0000 (10:28 +0800)]
RISC-V: Eliminate latter vsetvl when fused
Hi all,
A simple assembly check has been added in this version. Previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html
Thanks,
Bohan
------
The current vsetvl pass eliminates a vsetvl instruction when the previous
info is "available," but does not when "compatible." This can lead to not
only redundancy, but also incorrect behaviors when the previous info happens
to be compatible with a later vector instruction, which ends of using the
vsetvl info that should have been eliminated, as is shown in the testcase.
This patch eliminates the vsetvl when the previous info is "compatible."
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info):
Delete vsetvl insn when `prev_info` is compatible
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
In avx512f-mask-type.h, we need SIZE being defined to get
MASK_TYPE defined correctly. Fix those testcases where
SIZE are not defined before the include for avv512f-mask-type.h.
RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl pass
This patch fixes a bug in the current vsetvl pass. The current pass uses
`m_vl` to determine whether the dest operand has been used by non-RVV
instructions. However, `m_vl` may have been modified as a result of an
`update_avl` call, and thus would be no longer the dest operand of the
original instruction. This can lead to incorrect vsetvl eliminations, as is
shown in the testcase. In this patch, we create a `dest_vl` variable for
this scenerio.
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.
My last fix for this issue (PR c++/114947, r15-810) didn't go far
enough; I had assumed that the issue where we lost track of partial
specialisations we would need to walk again later was limited to
partitions (where we always re-walk all specialisations), but the linked
PR is the same cause but for header units, and it is possible to
construct test cases exposing the same bug just for normal modules.
As such this patch just unconditionally ensures that whenever we modify
DECL_TEMPLATE_SPECIALIZATIONS we also track any partial specialisations
that might have added.
Also clean up a couple of comments and assertions to make expected state
more obvious when processing these specs.
Richard Earnshaw [Wed, 21 Aug 2024 15:15:34 +0000 (16:15 +0100)]
arm: avoid indirect sibcalls when IP is live [PR116597]
On Arm only r0-r3 (the argument registers) and IP are available for
use as an address for an indirect sibcall. But if all the argument
registers are used and IP is clobbered during the epilogue, or is used
to pass closure information, then there is no spare register to hold
the address and we must reject the sibcall.
arm_function_ok_for_sibcall did try to handle this, but it did this by
examining the function declaration. That doesn't work if the function
has no prototype, or if the prototype has variadic arguments: we must,
instead, look at the list of actuals for the call rather than the list
of formals.
The old code also worked by laying out all the arguments and then
trying to add one more integer argument at the end of the list, but
this missed a corner case where a hole had been left in the argument
register list due to argument alignment.
We fix all of this by now scanning the list of actual values to be
passed and then checking if a core register has been assigned to that
argument. If it has, then we record which registers were assigned.
Once done we then look to see if all the argument registers have been
assigned and only block the sibcall if that is the case. This permits
us to sibcall:
int (*d)(int, ...);
int g(void);
int i () { return d(g(), 2LL);}
because r1 remains free (the 2LL argument is passed in {r2,r3}).
gcc/
PR target/116597
* config/arm/arm.cc (arm_function_ok_for_sibcall): Use the list of
actuals for the call, not the list of formals.
gcc/testsuite/
PR target/116597
* gcc.target/arm/pac-sibcall-2.c: New test.
* gcc.target/arm/pac-sibcall-3.c: New test.
Richard Biener [Wed, 11 Sep 2024 11:54:33 +0000 (13:54 +0200)]
tree-optimization/116674 - vectorizable_simd_clone_call and re-analysis
When SLP analysis scraps an instance because it fails to analyze we
can end up calling vectorizable_* in analysis mode on a node that
was analyzed during the analysis of that instance again.
vectorizable_simd_clone_call wasn't expecting that and instead
guarded analysis/transform code on populated data structures.
The following changes it so it survives re-analysis.
PR tree-optimization/116674
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Support
re-analysis.
Alex Coplan [Fri, 2 Aug 2024 08:56:07 +0000 (09:56 +0100)]
libstdc++: Restore unrolling in std::find using pragma [PR116140]
Together with the preparatory compiler patches, this patch restores
unrolling in std::__find_if, but this time relying on the compiler to do
it by using:
#pragma GCC unroll 4
which should restore the majority of the regression relative to the
hand-unrolled version while still being vectorizable with WIP alignment
peeling enhancements.
On Neoverse V1 with LTO, this reduces the regression in xalancbmk (from
SPEC CPU 2017) from 5.8% to 1.7% (restoring ~71% of the lost
performance).
libstdc++-v3/ChangeLog:
PR libstdc++/116140
* include/bits/stl_algobase.h (std::__find_if): Add #pragma to
request GCC to unroll the loop.
Alex Coplan [Sat, 3 Aug 2024 17:02:36 +0000 (17:02 +0000)]
lto: Stream has_unroll flag during LTO [PR116140]
When #pragma GCC unroll is processed in
tree-cfg.cc:replace_loop_annotate_in_block, we set both the loop->unroll
field (which is currently streamed out and back in during LTO) but also
the cfun->has_unroll flag.
cfun->has_unroll, however, is not currently streamed during LTO. This
patch fixes that.
Prior to this patch, loops marked with #pragma GCC unroll that would be
unrolled by RTL loop2_unroll in a non-LTO compilation didn't get
unrolled under LTO.
gcc/ChangeLog:
PR libstdc++/116140
* lto-streamer-in.cc (input_struct_function_base): Stream in
fn->has_unroll.
* lto-streamer-out.cc (output_struct_function_base): Stream out
fn->has_unroll.
gcc/testsuite/ChangeLog:
PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda-lto.C: New test.
being left behind by the testsuite. This is problematic not just from a
"missing cleanup" POV, but also because it can cause the test to pass
spuriously when the test is re-run wtih an unpatched compiler (without
the bug fix). In the broken case, loop2_unroll isn't run at all, so we
end up scanning the old dumpfile (from the previous test run) and making
the dumpfile scan pass.
Running with `-v -v` in RUNTESTFLAGS we can see the following cleanup
attempt is made:
looking again at the ltrans dump file above we can see this will fail for two
reasons:
- The actual dump file has no {C,exe} extension between the basename and
ltrans0.
- The actual dump file has an additional `.ltrans` component after `.ltrans0`.
This patch therefore relaxes the pattern constructed for cleaning up such
dumpfiles to also match dumpfiles with the above form.
Running the testsuite before/after this patch shows the number of files in
gcc/testsuite (in the build dir) with "ltrans" in the name goes from 1416 to 62
on aarch64.
Alex Coplan [Fri, 2 Aug 2024 08:52:50 +0000 (09:52 +0100)]
c++: Ensure ANNOTATE_EXPRs remain outermost expressions in conditions [PR116140]
For the testcase added with this patch, we would end up losing the:
#pragma GCC unroll 4
and emitting "warning: ignoring loop annotation". That warning comes
from tree-cfg.cc:replace_loop_annotate, and means that we failed to
process the ANNOTATE_EXPR in tree-cfg.cc:replace_loop_annotate_in_block.
That function walks backwards over the GIMPLE in an exiting BB for a
loop, skipping over the final gcond, and looks for any ANNOTATE_EXPRS
immediately preceding the gcond.
The function documents the following pre-condition:
/* [...] We assume that the annotations come immediately before the
condition in BB, if any. */
now looking at the exiting BB of the loop, we have:
and crucially there is an intervening assignment between the gcond and
the preceding .ANNOTATE ifn call. To see where this comes from, we can
look to the IR given by -fdump-tree-original:
if (<<cleanup_point ANNOTATE_EXPR <first != last && !use_find(short
int*)::<lambda(short int)>::operator() (&pred, *first), unroll 4>>>)
goto <D.4518>;
else
goto <D.4516>;
here the problem is that we've wrapped a CLEANUP_POINT_EXPR around the
ANNOTATE_EXPR, meaning the ANNOTATE_EXPR is no longer the outermost
expression in the condition.
The CLEANUP_POINT_EXPR gets added by the following call chain:
this patch chooses to fix the issue by first introducing a new helper
class (annotate_saver) to save and restore outer chains of
ANNOTATE_EXPRs and then using it in maybe_convert_cond.
With this patch, we don't get any such warning and the loop gets unrolled as
expected at -O2.
gcc/cp/ChangeLog:
PR libstdc++/116140
* semantics.cc (anotate_saver): New. Use it ...
(maybe_convert_cond): ... here, to ensure any ANNOTATE_EXPRs
remain the outermost expression(s) of the condition.
gcc/testsuite/ChangeLog:
PR libstdc++/116140
* g++.dg/ext/pragma-unroll-lambda.C: New test.
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.
gcc/ChangeLog:
* match.pd: Add case 2 for the signed .SAT_ADD consumed by
vect pattern.
* tree-vect-patterns.cc (gimple_signed_integer_sat_add): Add new
matching func decl for signed .SAT_ADD.
(vect_recog_sat_add_pattern): Add signed .SAT_ADD pattern match.
Jonathan Wakely [Tue, 10 Sep 2024 13:36:26 +0000 (14:36 +0100)]
libstdc++: Only use std::ios_base_library_init() for ELF [PR116159]
The undefined std::ios_base_library_init() symbol that is referenced by
<iostream> is only supposed to be used for targets where symbol
versioning is supported.
The mingw-w64 target defaults to --enable-symvers=gnu due to using GNU
ld but doesn't actually support symbol versioning. This means it tries
to emit references to the std::ios_base_library_init() symbol, which
isn't really defined in the library. This causes problems when using lld
to link user binaries.
Disable the undefined symbol reference for non-ELF targets.
libstdc++-v3/ChangeLog:
PR libstdc++/116159
* include/std/iostream (ios_base_library_init): Only define for
ELF targets.
* src/c++98/ios_init.cc (ios_base_library_init): Likewise.
Jonathan Wakely [Tue, 10 Sep 2024 13:25:41 +0000 (14:25 +0100)]
libstdc++: std::string move assignment should not use POCCA trait [PR116641]
The changes to implement LWG 2579 (r10-327-gdb33efde17932f) made
std::string::assign use the propagate_on_container_copy_assignment
(POCCA) trait, for consistency with operator=(const basic_string&).
However, this also unintentionally affected operator=(basic_string&&)
which calls assign(str) to make a deep copy when performing a move is
not possible. The fix is for the move assignment operator to call
_M_assign(str) instead of assign(str), as this just does the deep copy
and doesn't check the POCCA trait first.
The bug only affects the unlikely/useless combination of POCCA==true and
POCMA==false, but we should fix it for correctness anyway. it should
also make move assignment slightly cheaper to compile and execute,
because we skip the extra code in assign(const basic_string&).
libstdc++-v3/ChangeLog:
PR libstdc++/116641
* include/bits/basic_string.h (operator=(basic_string&&)): Call
_M_assign instead of assign.
* testsuite/21_strings/basic_string/allocator/116641.cc: New
test.
Jakub Jelinek [Tue, 10 Sep 2024 16:32:58 +0000 (18:32 +0200)]
c++: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]
The following testcase is miscompiled, because
get_member_function_from_ptrfunc
emits something like
(((FUNCTION.__pfn & 1) != 0)
? ptr + FUNCTION.__delta + FUNCTION.__pfn - 1
: FUNCTION.__pfn) (ptr + FUNCTION.__delta, ...)
or so, so FUNCTION tree is used there 5 times. There is
if (TREE_SIDE_EFFECTS (function)) function = save_expr (function);
but in this case function doesn't have side-effects, just nested ARRAY_REFs.
Now, if all the FUNCTION trees would be shared, it would work fine,
FUNCTION is evaluated in the first operand of COND_EXPR; but unfortunately
that isn't the case, both the BIT_AND_EXPR shortening and conversion to
bool done for build_conditional_expr actually unshare_expr that first
expression, but none of the other 4 are unshared. With -fsanitize=bounds,
.UBSAN_BOUNDS calls are added to the ARRAY_REFs and use save_expr to avoid
evaluating the argument multiple times, but because that FUNCTION tree is
first used in the second argument of COND_EXPR (i.e. conditionally), the
SAVE_EXPR initialization is done just there and then the third argument
of COND_EXPR just uses the uninitialized temporary and so does the first
argument computation as well.
The following patch fixes that by doing save_expr even if !TREE_SIDE_EFFECTS,
but to avoid doing that too often only if !nonvirtual and if the expression
isn't a simple decl.
2024-09-10 Jakub Jelinek <jakub@redhat.com>
PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Use save_expr
on instance_ptr and function even if it doesn't have side-effects,
as long as it isn't a decl.
Jonathan Wakely [Tue, 10 Sep 2024 15:59:29 +0000 (16:59 +0100)]
libstdc++: Add missing exception specifications in tests
Since r15-3532-g7cebc6384a0ad6 18_support/new_nothrow.cc fails in C++98 mode because G++
diagnoses missing exception specifications for the user-defined
(de)allocation functions. Add throw(std::bad_alloc) and throw() for
C++98 mode.
Similarly, 26_numerics/headers/numeric/synopsis.cc fails in C++20 mode
because the declarations of gcd and lcm are not noexcept.
libstdc++-v3/ChangeLog:
* testsuite/18_support/new_nothrow.cc (THROW_BAD_ALLOC): Define
macro to add exception specifications for C++98 mode.
(NOEXCEPT): Expand to throw() for C++98 mode.
* testsuite/26_numerics/headers/numeric/synopsis.cc (gcd, lcm):
Add noexcept.
Marek Polacek [Thu, 29 Aug 2024 19:13:03 +0000 (15:13 -0400)]
c++: mutable temps in rodata [PR116369]
Here we wrongly mark the reference temporary for g TREE_READONLY,
so it's put in .rodata and so we can't modify its subobject even
when the subobject is marked mutable. This is so since r9-869.
r14-1785 fixed a similar problem, but not in set_up_extended_ref_temp.
PR c++/116369
gcc/cp/ChangeLog:
* call.cc (set_up_extended_ref_temp): Don't mark a temporary
TREE_READONLY if its type is TYPE_HAS_MUTABLE_P.
The patch adds an option -foffload-abi-host-opts, which
is set by host in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes its value
to host_compiler.
Javier Miranda [Mon, 26 Aug 2024 18:56:37 +0000 (18:56 +0000)]
ada: First controlling parameter: report error without Extensions allowed
Enable reporting an error when this new aspect/pragma is set to
True, and the sources are compiled without language extensions
allowed.
gcc/ada/
* sem_ch13.adb (Analyze_One_Aspect): Call
Error_Msg_GNAT_Extension() to report an error when the aspect
First_Controlling_Parameter is set to True and the sources are
compiled without Core_Extensions_ Allowed.
* sem_prag.adb (Pragma_First_Controlling_Parameter): Call
subprogram Error_Msg_GNAT_Extension() to report an error when the
aspect First_Controlling_Parameter is set to True and the sources
are compiled without Core_Extensions_Allowed. Report an error when
the aspect pragma does not confirm an inherited True value.
Viljar Indus [Fri, 30 Aug 2024 11:22:16 +0000 (14:22 +0300)]
ada: Normalize span generation on different platforms
The total number of characters on a source code line
is different on Windows and Linux based systems
(CRLF vs LF endings). Use the last non line change
character to adjust printing the spans that go over
the end of line.
gcc/ada/
* diagnostics-pretty_emitter.adb (Get_Last_Line_Char): New. Get
the last non line change character. Write_Span_Labels use the
adjusted line end pointer to calculate the length of the span.
Piotr Trojanek [Wed, 28 Aug 2024 15:56:06 +0000 (17:56 +0200)]
ada: Evaluate calls to GNAT.Source_Info routines in semantic checking
When semantic checking mode is active, i.e. when switch -gnatc is
present or when the frontend is operating in the GNATprove mode,
we now rewrite calls to GNAT.Source_Info routines in evaluation
and not expansion (which is disabled in these modes).
This is needed to recognize constants initialized with calls to
GNAT.Source_Info as static constants, regardless of expansion being
enabled.
gcc/ada/
* exp_intr.ads, exp_intr.adb (Expand_Source_Info): Move
declaration to package spec.
* sem_eval.adb (Eval_Intrinsic_Call): Evaluate calls to
GNAT.Source_Info where possible.
Andrew Pinski [Mon, 9 Sep 2024 22:34:11 +0000 (15:34 -0700)]
phiopt: Move the common code between pass_phiopt and pass_cselim into a seperate function
When r14-303-gb9fedabe381cce was done, it was missed that some of the common parts could
be done in a template and a lambda could be used. This patch implements that. This new
function can be used later on to implement a simple ifcvt pass.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (execute_over_cond_phis): New template function,
moved the common parts from pass_phiopt::execute/pass_cselim::execute.
(pass_phiopt::execute): Move the functon specific parts of the loop
into an lamdba.
(pass_cselim::execute): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Mon, 9 Sep 2024 15:08:37 +0000 (08:08 -0700)]
phiopt: Use gimple_phi_result rather than PHI_RESULT [PR116643]
This converts the uses of PHI_RESULT in phiopt to be gimple_phi_result
instead. Since there was already a mismatch of uses here, it
would be good to use prefered one (gimple_phi_result) instead.
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr59539-1.c --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/pr59539-1.c --target_board='unix{-m64\ -march=cascadelake}'"
gcc/ChangeLog:
* config/i386/sse.md (*avx2_pcmp<mode>3_1): Don't force_reg
operands[3] when it's not const0_rtx.
Use a new struct diagnostic_option_id rather than just "int" when
referring to command-line options controlling warnings in the
diagnostic subsystem.
No functional change intended, but better documents the meaning of
the code.
gcc/c-family/ChangeLog:
* c-common.cc (c_option_controlling_cpp_diagnostic): Return
diagnostic_option_id rather than int.
(c_cpp_diagnostic): Update for renaming of
diagnostic_override_option_index to diagnostic_set_option_id.
gcc/c/ChangeLog:
* c-errors.cc (pedwarn_c23): Use "diagnostic_option_id option_id"
rather than "int opt". Update for renaming of diagnostic_info
field.
(pedwarn_c11): Likewise.
(pedwarn_c99): Likewise.
(pedwarn_c90): Likewise.
* c-tree.h (pedwarn_c90): Likewise for decl.
(pedwarn_c99): Likewise.
(pedwarn_c11): Likewise.
(pedwarn_c23): Likewise.
gcc/cp/ChangeLog:
* constexpr.cc (constexpr_error): Update for renaming of
diagnostic_info field.
* cp-tree.h (pedwarn_cxx98): Use "diagnostic_option_id" rather
than "int".
* error.cc (cp_adjust_diagnostic_info): Update for renaming of
diagnostic_info field.
(pedwarn_cxx98): Use "diagnostic_option_id option_id" rather than
"int opt". Update for renaming of diagnostic_info field.
(diagnostic_set_info): Likewise.
gcc/d/ChangeLog:
* d-diagnostic.cc (d_diagnostic_report_diagnostic): Update for
renaming of diagnostic_info field.
gcc/ChangeLog:
* diagnostic-core.h (struct diagnostic_option_id): New.
(warning): Use it rather than "int" for param.
(warning_n): Likewise.
(warning_at): Likewise.
(warning_meta): Likewise.
(pedwarn): Likewise.
(permerror_opt): Likewise.
(emit_diagnostic): Likewise.
(emit_diagnostic_valist): Likewise.
(emit_diagnostic_valist_meta): Likewise.
* diagnostic-format-json.cc
(json_output_format::on_report_diagnostic): Update for renaming of
diagnostic_info field.
* diagnostic-format-sarif.cc (sarif_builder::make_result_object):
Likewise.
(make_reporting_descriptor_object_for_warning): Likewise.
* diagnostic-format-text.cc (print_option_information): Likewise.
* diagnostic-global-context.cc (emit_diagnostic): Use
"diagnostic_option_id option_id" rather than "int opt".
(emit_diagnostic_valist): Likewise.
(emit_diagnostic_valist_meta): Likewise.
(warning): Likewise.
(warning_at): Likewise.
(warning_meta): Likewise.
(warning_n): Likewise.
(pedwarn): Likewise.
(permerror_opt): Likewise.
* diagnostic.cc (diagnostic_set_info_translated): Update for
renaming of diagnostic_info field.
(diagnostic_option_classifier::classify_diagnostic): Use
"diagnostic_option_id option_id" rather than "int opt".
(update_effective_level_from_pragmas): Update for renaming of
diagnostic_info field.
(diagnostic_context::diagnostic_enabled): Likewise.
(diagnostic_context::warning_enabled_at): Use
"diagnostic_option_id option_id" rather than "int opt".
(diagnostic_context::diagnostic_impl): Likewise.
(diagnostic_context::diagnostic_n_impl): Likewise.
* diagnostic.h (diagnostic_info::diagnostic_info): Update for...
(diagnostic_info::option_index): Rename...
(diagnostic_info::option_id): ...to this.
(class diagnostic_option_manager): Use
"diagnostic_option_id option_id" rather than "int opt" for vfuncs.
(diagnostic_option_classifier): Likewise for member funcs.
(diagnostic_classification_change_t::option): Add comment.
(diagnostic_context::warning_enabled_at): Use
"diagnostic_option_id option_id" rather than "int option_index".
(diagnostic_context::option_unspecified_p): Likewise.
(diagnostic_context::classify_diagnostic): Likewise.
(diagnostic_context::option_enabled_p): Likewise.
(diagnostic_context::make_option_name): Likewise.
(diagnostic_context::make_option_url): Likewise.
(diagnostic_context::diagnostic_impl): Likewise.
(diagnostic_context::diagnostic_n_impl): Likewise.
(diagnostic_override_option_index): Rename...
(diagnostic_set_option_id): ...to this, and update for
diagnostic_info field renaming.
(diagnostic_classify_diagnostic): Use "diagnostic_option_id"
rather than "int".
(warning_enabled_at): Likewise.
(option_unspecified_p): Likewise.
gcc/fortran/ChangeLog:
* cpp.cc (cb_cpp_diagnostic_cpp_option): Convert return type from
"int" to "diagnostic_option_id".
(cb_cpp_diagnostic): Update for renaming of
diagnostic_override_option_index to diagnostic_set_option_id.
* error.cc (gfc_warning): Update for renaming of diagnostic_info
field.
(gfc_warning_now_at): Likewise.
(gfc_warning_now): Likewise.
(gfc_warning_internal): Likewise.
gcc/ChangeLog:
* ipa-pure-const.cc: Replace include of "opts.h" with
"opts-diagnostic.h".
(suggest_attribute): Convert param from int to
diagnostic_option_id.
* lto-wrapper.cc (class lto_diagnostic_option_manager): Use
diagnostic_option_id rather than "int".
* opts-common.cc
(compiler_diagnostic_option_manager::option_enabled_p): Likewise.
* opts-diagnostic.h (class gcc_diagnostic_option_manager):
Likewise.
(class compiler_diagnostic_option_manager): Likewise.
* opts.cc (compiler_diagnostic_option_manager::make_option_name):
Likewise.
(gcc_diagnostic_option_manager::make_option_url): Likewise.
* substring-locations.cc
(format_string_diagnostic_t::emit_warning_n_va): Likewise.
(format_string_diagnostic_t::emit_warning_va): Likewise.
(format_string_diagnostic_t::emit_warning): Likewise.
(format_string_diagnostic_t::emit_warning_n): Likewise.
* substring-locations.h
(format_string_diagnostic_t::emit_warning_va): Likewise.
(format_string_diagnostic_t::emit_warning_n_va): Likewise.
(format_string_diagnostic_t::emit_warning): Likewise.
(format_string_diagnostic_t::emit_warning_n): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 9 Sep 2024 23:38:12 +0000 (19:38 -0400)]
diagnostics: rename dc.printer to m_printer [PR116613]
Rename diagnostic_context's "printer" field to "m_printer",
for consistency with other fields, and to highlight places
where we currently use this, to help assess feasibility
of supporting multiple output sinks (PR other/116613).
David Malcolm [Mon, 9 Sep 2024 23:38:11 +0000 (19:38 -0400)]
SARIF output: fix schema URL [§3.13.3, PR116603]
We were using
https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json
as the URL for the SARIF 2.1 schema, but this is now a 404.
Doing so added a validation error on
c-c++-common/diagnostic-format-sarif-file-pr111700.c
for which we emit this textual output:
this-file-does-not-exist.c: warning: #warning message [-Wcpp]
with no line number, and these invalid SARIF regions within the
physical location of the warning:
"region": {"startColumn": 2,
"endColumn": 9},
"contextRegion": {}
This is due to this directive:
# 0 "this-file-does-not-exist.c"
with line number 0.
The patch fixes this by not creating regions that have startLine <= 0.
gcc/ChangeLog:
PR other/116603
* diagnostic-format-sarif.cc (SARIF_SCHEMA): Update URL.
(sarif_builder::maybe_make_region_object): Don't create regions
with startLine <= 0.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
i386: Use offsetable address constraint for double-word memory operands
Double-word memory operands are accessed as their high and low part, so the
memory location has to be offsettable. Use "o" constraint instead of "m"
for double-word memory operands.
gcc/ChangeLog:
* config/i386/i386.md (*insvdi_lowpart_1): Use "o" constraint
instead of "m" for double-word mode memory operands.
(*add<dwi>3_doubleword_zext): Ditto.
(*addv<dwi>4_doubleword_1): Use "jO" constraint instead of "jM"
for double-word mode memory operands.
Andrew Pinski [Thu, 29 Aug 2024 19:10:44 +0000 (12:10 -0700)]
middle-end: also optimized `popcount(a) <= 1` [PR90693]
This expands on optimizing `popcount(a) == 1` to also handle
`popcount(a) <= 1`. `<= 1` can be expanded as `(a & -a) == 0`
like what is done for `== 1` if we know that a was nonzero.
We have to do the optimization in 2 places due to if we have
an optab entry for popcount or not.
Built and tested for aarch64-linux-gnu.
PR middle-end/90693
gcc/ChangeLog:
* internal-fn.cc (expand_POPCOUNT): Handle the second argument
being `-1` for `<= 1`.
* tree-ssa-math-opts.cc (match_single_bit_test): Handle LE/GT
cases.
(math_opts_dom_walker::after_dom_children): Call match_single_bit_test
for LE_EXPR/GT_EXPR also.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/popcnt-le-1.c: New test.
* gcc.target/aarch64/popcnt-le-2.c: New test.
* gcc.target/aarch64/popcnt-le-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Wed, 28 Aug 2024 12:06:48 +0000 (14:06 +0200)]
tree-optimization/116514 - handle pointer difference in bit-CCP
When evaluating the difference of two aligned pointers in CCP we
fail to handle the EXACT_DIV_EXPR by the element size that occurs.
The testcase then also exercises modulo to test alignment but
modulo by a power-of-two isn't handled either.
PR tree-optimization/116514
* tree-ssa-ccp.cc (bit_value_binop): Handle EXACT_DIV_EXPR
like TRUNC_DIV_EXPR. Handle exact division of a signed value
by a power-of-two like a shift. Handle unsigned division by
a power-of-two like a shift.
Handle unsigned TRUNC_MOD_EXPR by power-of-two, handle signed
TRUNC_MOD_EXPR by power-of-two if the result is zero.
The following avoids classifying a double reduction that's not
actually a reduction in the outer loop (because its value isn't
used outside of the outer loop). This avoids us ICEing on the
unexpected stmt/SLP node arrangement.
Andrew Pinski [Fri, 6 Sep 2024 19:29:26 +0000 (12:29 -0700)]
gimple-fold: Move optimizing memcpy to memset to fold_stmt from fab
I noticed this folding inside fab could be done else where and could
even improve inlining decisions and a few other things so let's
move it to fold_stmt.
It also fixes PR 116601 because places which call fold_stmt already
have to deal with the stmt becoming a non-throw statement.
For the fix for PR 116601 on the branches should be the original patch
rather than a backport of this one.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/116601
gcc/ChangeLog:
* gimple-fold.cc (optimize_memcpy_to_memset): Move
from tree-ssa-ccp.cc and rename. Also return true
if the optimization happened.
(gimple_fold_builtin_memory_op): Call
optimize_memcpy_to_memset.
(fold_stmt_1): Call optimize_memcpy_to_memset for
load/store copies.
* tree-ssa-ccp.cc (optimize_memcpy): Delete.
(pass_fold_builtins::execute): Remove code that
calls optimize_memcpy.
gcc/testsuite/ChangeLog:
* gcc.dg/pr78408-1.c: Adjust dump scan to match where
the optimization now happens.
* g++.dg/torture/except-2.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Mon, 9 Sep 2024 07:41:36 +0000 (09:41 +0200)]
Amend gcc.dg/vect/fast-math-vect-call-2.c
There was a reported regression on x86-64 with -march=cascadelake
and -m32 where epilogue vectorization causes a different number of
SLPed loops. Fixed by disabling epilogue vectorization for the
testcase.
Jakub Jelinek [Mon, 9 Sep 2024 07:37:26 +0000 (09:37 +0200)]
testsuite: Fix up pr116588.c test [PR116588]
The test as committed without the tree-vrp.cc change only FAILs with
FAIL: gcc.dg/pr116588.c scan-tree-dump-not vrp2 "0 != 0"
The DEBUG code in there was just to make it easier to debug, but doesn't
actually fail when the test is miscompiled.
We don't need such debugging code in simple tests like that, but it is
useful if they abort when miscompiled.
With this patch without the tree-vrp.cc change I see
FAIL: gcc.dg/pr116588.c execution test
FAIL: gcc.dg/pr116588.c scan-tree-dump-not vrp2 "0 != 0"
and with it it passes.
2024-09-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/116588
* gcc.dg/pr116588.c: Remove -DDEBUG from dg-options.
(main): Remove debugging code and simplify.
Thomas Schwinge [Mon, 9 Sep 2024 06:39:10 +0000 (08:39 +0200)]
Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635]
Fix up to make 'gcc.dg/opt-ordered-and-nonequal-1.c' of
commit 91421e21e8f0f05f440174b8de7a43a311700e08
"Match: Fix ordered and nonequal" work for default
'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations.
PR testsuite/116635
gcc/testsuite/
* gcc.dg/opt-ordered-and-nonequal-1.c: Fix re
'LOGICAL_OP_NON_SHORT_CIRCUIT'.
Andrew Pinski [Sat, 31 Aug 2024 20:54:21 +0000 (13:54 -0700)]
phiopt: Small refactoring/cleanup of non-ssa name case of factor_out_conditional_operation
This small cleanup removes a redundant check for gimple_assign_cast_p and reformats
based on that. Also changes the if statement that checks if the integral type and the
check to see if the constant fits into the new type such that it returns null
and reformats based on that.
Also moves the check for has_single_use earlier so it is less complex still a cheaper
check than some of the others (like the check on the integer side).
This was noticed when adding a few new things to factor_out_conditional_operation
but those are not ready to submit yet.
Note there are no functional difference with this change.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move the has_single_use
checks much earlier. Remove redundant check for gimple_assign_cast_p.
Change around the check if the integral consts fits into the new type.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
H.J. Lu [Fri, 6 Sep 2024 12:24:07 +0000 (05:24 -0700)]
x86-64: Don't use temp for argument in a TImode register
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
in a TImode register. Otherwise, the TImode variable will be put in
the GPR save area which guarantees only 8-byte alignment.
gcc/
PR target/116621
* config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
a PARALLEL BLKmode container of an EXPR_LIST expression in a
TImode register.
gcc/testsuite/
PR target/116621
* gcc.target/i386/pr116621.c: New test.
Cache the source files as they are read, rather than discarding them at
the end of output_lines (), and move the reading of the source file to
the new function slurp.
This patch does not really change anything other than moving the file
reading out of output_file, but set gcov up for more interaction with
the source file. The motvating example is reporting coverage on
functions from different source files, notably C++ headers and
((always_inline)).
Here is an example of what gcov does today:
hello.h:
inline __attribute__((always_inline))
int hello (const char *s)
{
if (s)
printf ("hello, %s!\n", s);
else
printf ("hello, world!\n");
return 0;
}
hello.c:
int notmain(const char *entity)
{
return hello (entity);
}
int main()
{
const char *empty = 0;
if (!empty)
hello (empty);
else
puts ("Goodbye!");
}
$ gcov -abc hello
function notmain called 0 returned 0% blocks executed 0%
#####: 4:int notmain(const char *entity)
%%%%%: 4-block 2
branch 0 never executed (fallthrough)
branch 1 never executed
-: 5:{
#####: 6: return hello (entity);
%%%%%: 6-block 7
-: 7:}
Clearly there is a branch in notmain, but the branch comes from the
inlining of hello. This is not very obvious from looking at the output.
Here is hello.h.gcov:
-: 3:inline __attribute__((always_inline))
-: 4:int hello (const char *s)
-: 5:{
#####: 6: if (s)
%%%%%: 6-block 3
branch 0 never executed (fallthrough)
branch 1 never executed
%%%%%: 6-block 2
branch 2 never executed (fallthrough)
branch 3 never executed
#####: 7: printf ("hello, %s!\n", s);
%%%%%: 7-block 4
call 0 never executed
%%%%%: 7-block 3
call 1 never executed
-: 8: else
#####: 9: printf ("hello, world!\n");
%%%%%: 9-block 5
call 0 never executed
%%%%%: 9-block 4
call 1 never executed
#####: 10: return 0;
%%%%%: 10-block 6
%%%%%: 10-block 5
-: 11:}
The blocks from the different call sites have all been interleaved.
The reporting could tuned be to list the inlined function, too, like
this:
1: 4:int notmain(const char *entity)
-: == inlined from hello.h ==
1: 6: if (s)
branch 0 taken 0 (fallthrough)
branch 1 taken 1
#####: 7: printf ("hello, %s!\n", s);
%%%%%: 7-block 3
call 0 never executed
-: 8: else
1: 9: printf ("hello, world!\n");
1: 9-block 4
call 0 returned 1
1: 10: return 0;
1: 10-block 5
-: == inlined from hello.h (end) ==
-: 5:{
1: 6: return hello (entity);
1: 6-block 7
-: 7:}
Implementing something to this effect relies on having the sources for
both files (hello.c, hello.h) available, which is what this patch sets
up.
Note that the previous reading code would leak the source file content,
and explicitly storing them is not a huge departure nor performance
implication. I verified this with valgrind:
With slurp:
$ valgrind gcov ./hello
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'
File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'
== ==
== == HEAP SUMMARY:
== == in use at exit: 84,907 bytes in 54 blocks
== == total heap usage: 254 allocs, 200 frees, 137,156 bytes allocated
== ==
== == LEAK SUMMARY:
== == definitely lost: 1,237 bytes in 22 blocks
== == indirectly lost: 562 bytes in 18 blocks
== == possibly lost: 0 bytes in 0 blocks
== == still reachable: 83,108 bytes in 14 blocks
== == of which reachable via heuristic:
== == newarray : 1,544 bytes in 1 blocks
== == suppressed: 0 bytes in 0 blocks
== == Rerun with --leak-check=full to see details of leaked memory
== ==
== == For lists of detected and suppressed errors, rerun with: -s
== == ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Without slurp:
$ valgrind gcov ./demo
== == Memcheck, a memory error detector
== == Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
== == Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
== == Command: ./gcc/gcov demo
== ==
File 'hello.c'
Lines executed:100.00% of 4
Creating 'hello.c.gcov'
File 'hello.h'
Lines executed:75.00% of 4
Creating 'hello.h.gcov'
Lines executed:87.50% of 8
== ==
== == HEAP SUMMARY:
== == in use at exit: 85,316 bytes in 82 blocks
== == total heap usage: 250 allocs, 168 frees, 137,084 bytes allocated
== ==
== == LEAK SUMMARY:
== == definitely lost: 1,646 bytes in 50 blocks
== == indirectly lost: 562 bytes in 18 blocks
== == possibly lost: 0 bytes in 0 blocks
== == still reachable: 83,108 bytes in 14 blocks
== == of which reachable via heuristic:
== == newarray : 1,544 bytes in 1 blocks
== == suppressed: 0 bytes in 0 blocks
== == Rerun with --leak-check=full to see details of leaked memory
== ==
== == For lists of detected and suppressed errors, rerun with: -s
== == ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
gcc/ChangeLog:
* gcov.cc (release_structures): Release source_lines.
(slurp): New function.
(output_lines): Read sources with slurp.
Jason Merrill [Fri, 6 Sep 2024 19:28:53 +0000 (15:28 -0400)]
c++: exception spec and stdlib specialization
We were silently accepting the pr65923.C specialization of std::swap with
the wrong exception specification; it should be declared noexcept. Let's
limit ignoring mismatch with system headers to extern "C" functions so we
get a diagnostic for the C++ library.
In the case of an omitted exception-specification, let's also lower the
error to a pedwarn, and copy the missing spec over, to avoid a hard break
for code that accidentally relied on the old behavior.
...except extern "C" functions keep the new spec, to avoid breaking dubious
code like noexcept-type19.C.
gcc/cp/ChangeLog:
* decl.cc (check_redeclaration_exception_specification): Remove
OPT_Wsystem_headers from pedwarn when the old declaration is
in a system header. Also check std namespace.
Andrew Pinski [Sat, 7 Sep 2024 18:43:03 +0000 (11:43 -0700)]
split-path: Fix dump wording about duplicating too many statements
It was pointed out in https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662183.html,
that the wording with this print has too many words.
Fixed thusly.
Pushed as obvious after a build and test for x86_64-linux-gnu.
gcc/ChangeLog:
* gimple-ssa-split-paths.cc (is_feasible_trace): Fix wording
on the print.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Patrick Palka [Sat, 7 Sep 2024 18:06:37 +0000 (14:06 -0400)]
c++: deferring partial substitution into lambda [PR116567]
Here we correctly defer partial substitution into the lambda used as
a default template argument, but then incorrectly perform the full
substitution, because add_extra_args adds outer template arguments from
the full substitution that are not related to the original template
context of the lambda. For example, the template depth of the first
lambda is 1 but add_extra_args return a set of args with 3 levels, with
the inner level corresponding to the parameters of v1 (good) and the
outer levels corresponding to those of A and B (bad).
For the cases that we're interested in, add_extra_args can assume that
the deferred args are a full set of template arguments, and so it
suffices to just substitute into the deferred args and not do any
additional merging.
This patch refines add_extra_args accordingly, and additionally
makes it look for the tf_partial flag instead of for dependent args to
decide if the deferred substitution is a partial one. This reveals we
were neglecting to set tf_partial when substituting into a default
template argument in a template context.
PR c++/116567
gcc/cp/ChangeLog:
* pt.cc (coerce_template_parms): Set tf_partial when substituting
into a default template argument in a template context.
(build_extra_args): Set TREE_STATIC on the deferred args if this
is a partial substitution.
(add_extra_args): Check TREE_STATIC instead of dependence of args.
Adjust merging behavior in that case.
(tsubst_lammda_expr): Check for tf_partial instead of dependence
of args when determining whether to defer substitution.
(tsubst_expr) <case LAMBDA_EXPR>: Remove tf_partial early exit.
Thomas Koenig [Sat, 7 Sep 2024 14:59:46 +0000 (16:59 +0200)]
Implement first part of unsigned integers for Fortran.
gcc/fortran/ChangeLog:
* arith.cc (gfc_reduce_unsigned): New function.
(gfc_arith_error): Add ARITH_UNSIGNED_TRUNCATED and
ARITH_UNSIGNED_NEGATIVE.
(gfc_arith_init_1): Initialize unsigned types.
(gfc_check_unsigned_range): New function.
(gfc_range_check): Handle unsigned types.
(gfc_arith_uminus): Likewise.
(gfc_arith_plus): Likewise.
(gfc_arith_minus): Likewise.
(gfc_arith_times): Likewise.
(gfc_arith_divide): Likewise.
(gfc_compare_expr): Likewise.
(eval_intrinsic): Likewise.
(gfc_int2int): Also convert unsigned.
(gfc_uint2uint): New function.
(gfc_int2uint): New function.
(gfc_uint2int): New function.
(gfc_uint2real): New function.
(gfc_uint2complex): New function.
(gfc_real2uint): New function.
(gfc_complex2uint): New function.
(gfc_log2uint): New function.
(gfc_uint2log): New function.
* arith.h (gfc_int2uint, gfc_uint2uint, gfc_uint2int, gfc_uint2real):
Add prototypes.
(gfc_uint2complex, gfc_real2uint, gfc_complex2uint, gfc_log2uint):
Likewise.
(gfc_uint2log): Likewise.
* check.cc (gfc_boz2uint): New function
(type_check2): New function.
(int_or_real_or_unsigned_check): New function.
(less_than_bitsizekind): Adjust for unsingeds.
(less_than_bitsize2): Likewise.
(gfc_check_allocated): Likewise.
(gfc_check_mod): Likewise.
(gfc_check_bge_bgt_ble_blt): Likewise.
(gfc_check_bitfcn): Likewise.
(gfc_check_digits): Likewise.
(gfc_check_dshift): Likewise.
(gfc_check_huge): Likewise.
(gfc_check_iu): New function.
(gfc_check_iand_ieor_ior): Adjust for unsigneds.
(gfc_check_ibits): Likewise.
(gfc_check_uint): New function.
(gfc_check_ishft): Adjust for unsigneds.
(gfc_check_ishftc): Likewise.
(gfc_check_min_max): Likewise.
(gfc_check_merge_bits): Likewise.
(gfc_check_selected_int_kind): Likewise.
(gfc_check_shift): Likewise.
(gfc_check_mvbits): Likewise.
(gfc_invalid_unsigned_ops): Likewise.
* decl.cc (gfc_match_decl_type_spec): Likewise.
* dump-parse-tree.cc (show_expr): Likewise.
* expr.cc (gfc_get_constant_expr): Likewise.
(gfc_copy_expr): Likewise.
(gfc_extract_int): Likewise.
(numeric_type): Likewise.
* gfortran.h (enum arith): Extend with ARITH_UNSIGNED_TRUNCATED
and ARITH_UNSIGNED_NEGATIVE.
(enum gfc_isym_id): Extend with GFC_ISYM_SU_KIND and GFC_ISYM_UINT.
(gfc_check_unsigned_range): New prototype-
(gfc_arith_error): Likewise.
(gfc_reduce_unsigned): Likewise.
(gfc_boz2uint): Likewise.
(gfc_invalid_unsigned_ops): Likewise.
(gfc_convert_mpz_to_unsigned): Likewise.
* gfortran.texi: Add some rudimentary documentation.
* intrinsic.cc (gfc_type_letter): Adjust for unsigneds.
(add_functions): Add uint and adjust functions to be called.
(add_conversions): Add unsigned conversions.
(gfc_convert_type_warn): Adjust for unsigned.
* intrinsic.h (gfc_check_iu, gfc_check_uint, gfc_check_mod, gfc_simplify_uint,
gfc_simplify_selected_unsigned_kind, gfc_resolve_uint): New prototypes.
* invoke.texi: Add -funsigned.
* iresolve.cc (gfc_resolve_dshift): Handle unsigneds.
(gfc_resolve_iand): Handle unsigneds.
(gfc_resolve_ibclr): Handle unsigneds.
(gfc_resolve_ibits): Handle unsigneds.
(gfc_resolve_ibset): Handle unsigneds.
(gfc_resolve_ieor): Handle unsigneds.
(gfc_resolve_ior): Handle unsigneds.
(gfc_resolve_uint): Handle unsigneds.
(gfc_resolve_merge_bits): Handle unsigneds.
(gfc_resolve_not): Handle unsigneds.
* lang.opt: Add -funsigned.
* libgfortran.h: Add BT_UNSIGNED.
* match.cc (gfc_match_type_spec): Match UNSIGNED.
* misc.cc (gfc_basic_typename): Add UNSIGNED.
(gfc_typename): Likewise.
* primary.cc (convert_unsigned): New function.
(match_unsigned_constant): New function.
(gfc_match_literal_constant): Handle unsigned.
* resolve.cc (resolve_operator): Handle unsigned.
(resolve_ordinary_assign): Likewise.
* simplify.cc (convert_mpz_to_unsigned): Renamed to...
(gfc_convert_mpz_to_unsigned): and adjusted.
(gfc_simplify_bit_size): Adjusted for unsigned.
(compare_bitwise): Likewise.
(gfc_simplify_bge): Likewise.
(gfc_simplify_bgt): Likewise.
(gfc_simplify_ble): Likewise.
(gfc_simplify_blt): Likewise.
(simplify_cmplx): Likewise.
(gfc_simplify_digits): Likewise.
(simplify_dshift): Likewise.
(gfc_simplify_huge): Likewise.
(gfc_simplify_iand): Likewise.
(gfc_simplify_ibclr): Likewise.
(gfc_simplify_ibits): Likewise.
(gfc_simplify_ibset): Likewise.
(gfc_simplify_ieor): Likewise.
(gfc_simplify_uint): Likewise.
(gfc_simplify_ior): Likewise.
(simplify_shift): Likewise.
(gfc_simplify_ishftc): Likewise.
(gfc_simplify_merge_bits): Likewise.
(min_max_choose): Likewise.
(gfc_simplify_mod): Likewise.
(gfc_simplify_modulo): Likewise.
(gfc_simplify_popcnt): Likewise.
(gfc_simplify_range): Likewise.
(gfc_simplify_selected_unsigned_kind): Likewise.
(gfc_convert_constant): Likewise.
* target-memory.cc (size_unsigned): New function.
(gfc_element_size): Adjust for unsigned.
* trans-const.h (gfc_conv_mpz_unsigned_to_tree): Add prototype.
* trans-const.cc (gfc_conv_mpz_unsigned_to_tree): Handle unsigneds.
(gfc_conv_constant_to_tree): Likewise.
* trans-decl.cc (gfc_conv_cfi_to_gfc): Put in "not yet implemented".
* trans-expr.cc (gfc_conv_gfc_desc_to_cfi_desc): Likewise.
* trans-stmt.cc (gfc_trans_integer_select): Handle unsigned.
(gfc_trans_select): Likewise.
* trans-intrinsic.cc (gfc_conv_intrinsic_mod): Handle unsigned.
(gfc_conv_intrinsic_shift): Likewise.
(gfc_conv_intrinsic_function): Add GFC_ISYM_UINT.
* trans-io.cc (enum iocall): Add IOCALL_X_UNSIGNED and IOCALL_X_UNSIGNED_WRITE.
(gfc_build_io_library_fndecls): Add transfer_unsigned and transfer_unsigned_write.
(transfer_expr): Handle unsigneds.
* trans-types.cc (gfc_unsinged_kinds): New array.
(gfc_unsigned_types): Likewise.
(gfc_init_kinds): Handle them.
(validate_unsigned): New function.
(gfc_validate_kind): Use it.
(gfc_build_unsigned_type): New function.
(gfc_init_types): Use it.
(gfc_get_unsigned_type): New function.
(gfc_typenode_for_spec): Handle unsigned.
* trans-types.h (gfc_get_unsigned_type): New prototype.
libgfortran/ChangeLog:
* gfortran.map: Add _gfortran_transfer_unsgned and
_gfortran_transfer-signed.
* io/io.h (set_unsigned): New prototype.
(us_max): New prototype.
(read_decimal_unsigned): New prototype.
(write_iu): New prototype.
* io/list_read.c (convert_unsigned): New function.
(read_integer): Also handle unsigneds.
(list_formatted_read_scalar): Handle unsigneds.
(nml_read_obj): Likewise.
* io/read.c (set_unsigned): New function.
(us_max): New function.
(read_utf8): Whitespace fixes.
(read_default_char1): Whitespace fixes.
(read_a_char4): Whitespace fixes.
(next_char): Whiltespace fixes.
(read_decimal_unsigned): New function.
(read_f): Whitespace fixes.
(read_x): Whitespace fixes.
* io/transfer.c (transfer_unsigned): New function.
(transfer_unsigned_write): New function.
(require_one_of_two_types): New function.
(formatted_transfer_scalar_read): Use it.
(formatted_transfer_scalar_write): Also use it.
* io/write.c (write_decimal_unsigned): New function.
(write_iu): New function.
(write_unsigned): New function.
(list_formatted_write_scalar): Adjust for unsigneds.
* libgfortran.h (GFC_UINTEGER_1_HUGE): Define.
(GFC_UINTEGER_2_HUGE): Define.
(GFC_UINTEGER_4_HUGE): Define.
(GFC_UINTEGER_8_HUGE): Define.
(GFC_UINTEGER_16_HUGE): Define.
(HAVE_GFC_UINTEGER_1): Undefine (done by mk-kind-h.sh)
(HAVE_GFC_UINTEGER_4): Likewise.
* mk-kinds-h.sh: Add GFC_UINTEGER_*_HUGE.
gcc/testsuite/ChangeLog:
* gfortran.dg/unsigned_1.f90: New test.
* gfortran.dg/unsigned_10.f90: New test.
* gfortran.dg/unsigned_11.f90: New test.
* gfortran.dg/unsigned_12.f90: New test.
* gfortran.dg/unsigned_13.f90: New test.
* gfortran.dg/unsigned_14.f90: New test.
* gfortran.dg/unsigned_15.f90: New test.
* gfortran.dg/unsigned_16.f90: New test.
* gfortran.dg/unsigned_17.f90: New test.
* gfortran.dg/unsigned_18.f90: New test.
* gfortran.dg/unsigned_19.f90: New test.
* gfortran.dg/unsigned_2.f90: New test.
* gfortran.dg/unsigned_20.f90: New test.
* gfortran.dg/unsigned_21.f90: New test.
* gfortran.dg/unsigned_22.f90: New test.
* gfortran.dg/unsigned_23.f90: New test.
* gfortran.dg/unsigned_24.f: New test.
* gfortran.dg/unsigned_3.f90: New test.
* gfortran.dg/unsigned_4.f90: New test.
* gfortran.dg/unsigned_5.f90: New test.
* gfortran.dg/unsigned_6.f90: New test.
* gfortran.dg/unsigned_7.f90: New test.
* gfortran.dg/unsigned_8.f90: New test.
* gfortran.dg/unsigned_9.f90: New test.
Jakub Jelinek [Sat, 7 Sep 2024 07:36:53 +0000 (09:36 +0200)]
libiberty: Fix up > 64K section handling in simple_object_elf_copy_lto_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
Jonathan Wakely [Wed, 4 Sep 2024 20:23:20 +0000 (21:23 +0100)]
libstdc++: Fix std::chrono::parse for TAI and GPS clocks
Howard Hinnant brought to my attention that chrono::parse was giving
incorrect values for chrono::gps_clock, because it was applying the
offset between the GPS clock and UTC. That's incorrect, because when we
parse HH::MM::SS as a GPS time, the result should be that time, not
HH:MM:SS+offset.
The problem was that I was using clock_cast to convert from sys_time to
utc_time and then using clock_time again to convert to gps_time. The
solution is to convert the parsed time into an duration representing the
time since the GPS clock's epoch, then construct a gps_time directly
from that duration.
As well as adding tests for correct round tripping of times for all
clocks, this also adds some more tests for correct results with
std::format.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (from_stream): Fix conversions in
overloads for gps_time and tai_time.
* testsuite/std/time/clock/file/io.cc: Test round tripping using
chrono::parse. Add additional std::format tests.
* testsuite/std/time/clock/gps/io.cc: Likewise.
* testsuite/std/time/clock/local/io.cc: Likewise.
* testsuite/std/time/clock/tai/io.cc: Likewise.
* testsuite/std/time/clock/utc/io.cc: Likewise.
Carl Love [Fri, 6 Sep 2024 16:06:34 +0000 (12:06 -0400)]
rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
The built-ins currently support vector unsigned char arguments. Extend the
built-ins to also support vector signed char and vector bool char
arguments.
Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros. The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise. Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.
Add addtional test cases for the built-ins in files:
gcc/testsuite/gcc.target/powerpc/lsbb.c
gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c
gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_test_lsbb_all_ones,
vec_test_lsbb_all_zeros): Add built-in instances for vector signed
char and vector bool char.
* doc/extend.texi (vec_test_lsbb_all_ones,
vec_test_lsbb_all_zeros): Add documentation for the
existing built-ins.
gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
signed char and vector bool char instances of
vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
* gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
signed char and vector bool char instances of
vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
Tamar Christina [Fri, 6 Sep 2024 13:05:43 +0000 (14:05 +0100)]
middle-end: check that the lhs of a COND_EXPR is an SSA_NAME in cond_store recognition [PR116628]
Because the vect_recog_bool_pattern can at the moment still transition
out of GIMPLE and back into GENERIC the vect_recog_cond_store_pattern can
end up using an expression as a mask rather than an SSA_NAME.
This adds an explicit check that we have a mask and not an expression.
gcc/ChangeLog:
PR tree-optimization/116628
* tree-vect-patterns.cc (vect_recog_cond_store_pattern): Add SSA_NAME
check on expression.
gcc/testsuite/ChangeLog:
PR tree-optimization/116628
* gcc.dg/vect/pr116628.c: New test.
Andrew Pinski [Wed, 4 Sep 2024 16:06:53 +0000 (09:06 -0700)]
aarch64: Use is_attribute_namespace_p and get_attribute_name inside aarch64_lookup_shared_state_flags [PR116598]
The code in aarch64_lookup_shared_state_flags all C++11 attributes on the function type
had a namespace associated with them. But with the addition of reproducible/unsequenced,
this is not true.
This fixes the issue by using is_attribute_namespace_p instead of manually figuring out
the namespace is named "arm" and uses get_attribute_name instead of manually grabbing
the attribute name.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/116598
* config/aarch64/aarch64.cc (aarch64_lookup_shared_state_flags): Use
is_attribute_namespace_p and get_attribute_name instead of manually grabbing
the namespace and name of the attribute.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Martin Jambor [Fri, 6 Sep 2024 12:12:54 +0000 (14:12 +0200)]
ipa: Move pass_ipa_cdtor_merge before pass_ipa_cp and pass_ipa_sra
When looking at PR 115815 we realized that it would make sense to make
calls to functions originally declared static constructors and
destructors created by pass_ipa_cdtor_merge visible to IPA-SRA. This
patch does that.
gcc/ChangeLog:
2024-07-25 Martin Jambor <mjambor@suse.cz>
* passes.def: Move pass_ipa_cdtor_merge before pass_ipa_cp and
pass_ipa_sra.
Martin Jambor [Fri, 6 Sep 2024 12:12:53 +0000 (14:12 +0200)]
ipa: Treat static constructors and destructors as non-local (PR 115815)
In PR 115815, IPA-SRA thought it had control over all invocations of a
(recursive) static destructor but it did not see the implied
invocation which led to the original being left behind and the
clean-up code encountering uses of SSAs that definitely should have
been dead.
Fixed by teaching cgraph_node::can_be_local_p about static
constructors and destructors. Similar test is missing in
cgraph_node::local_p so I added the check there as well.
gcc/ChangeLog:
2024-07-25 Martin Jambor <mjambor@suse.cz>
PR ipa/115815
* cgraph.cc (cgraph_node_cannot_be_local_p_1): Also check
DECL_STATIC_CONSTRUCTOR and DECL_STATIC_DESTRUCTOR.
* ipa-visibility.cc (non_local_p): Likewise.
(cgraph_node::local_p): Delete extraneous line of tabs.
gcc/testsuite/ChangeLog:
2024-07-25 Martin Jambor <mjambor@suse.cz>
PR ipa/115815
* gcc.dg/lto/pr115815_0.c: New test.
Jakub Jelinek [Fri, 6 Sep 2024 11:50:47 +0000 (13:50 +0200)]
c++: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]
The following patch partially implements CWG 2867
- Order of initialization for structured bindings.
The DR requires that initialization of e is sequenced before r_i and
that r_i initialization is sequenced before r_j for j > i, we already do it
that way, the former ordering is a necessity so that the get calls are
actually emitted on already initialized variable, the rest just because
we implemented it that way, by going through the structured binding
vars in ascending order and doing their initialization.
The hard part not implemented yet is the lifetime extension of the
temporaries from the e initialization to after the get calls (if any).
Unlike the range-for lifetime extension patch which I've posted recently
where IMO we can just ignore lifetime extension of reference bound
temporaries because all the temporaries are extended to the same spot,
here lifetime extension of reference bound temporaries should last until
the end of lifetime of e, while other temporaries only after all the get
calls.
The patch just attempts to deal with automatic structured bindings for now,
I'll post a patch for static locals incrementally and I don't have a patch
for namespace scope structured bindings yet, this patch should just keep
existing behavior for both static locals and namespace scope structured
bindings.
What GCC currently emits is a CLEANUP_POINT_EXPR around the e
initialization, followed optionally by nested CLEANUP_STMTs for cleanups
like the e dtor if any and dtors of lifetime extended temporaries from
reference binding; inside of the CLEANUP_STMT CLEANUP_BODY then the
initialization of the individual variables for the tuple case, again with
optional CLEANUP_STMT if e.g. lifetime extended temporaries from reference
binding are needed in those.
The following patch drops that first CLEANUP_POINT_EXPR and instead
wraps the whole sequence of the e initialization and the individual variable
initialization with get calls after it into a single CLEANUP_POINT_EXPR.
If there are any CLEANUP_STMTs needed, they are all emitted first, with
the CLEANUP_POINT_EXPR for e initialization and the individual variable
initialization inside of those, and a guard variable set after different
phases in those expressions guarding the corresponding cleanups, so that
they aren't invoked until the respective variables are constructed.
This is implemented by cp_finish_decl doing cp_finish_decomp on its own
when !processing_template_decl (otherwise we often don't cp_finish_decl
or process it at a different time from when we want to call
cp_finish_decomp) or unless the decl is erroneous (cp_finish_decl has
too many early returns for erroneous cases, and for those we can actually
call it even multiple times, for the non-erroneous cases
non-processing_template_decl cases we need to call it just once).
The two testcases try to construct various temporaries and variables and
verify the order in which the temporaries and variables are constructed and
destructed.
2024-09-06 Jakub Jelinek <jakub@redhat.com>
PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add TEST_P argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1. Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp for structured bindings
here, first with test_p. For automatic structured binding bases
if the test cp_finish_decomp returned true wrap the initialization
together with what non-test cp_finish_decomp emits with a
CLEANUP_POINT_EXPR, and if there are any CLEANUP_STMTs needed, emit
them around the whole CLEANUP_POINT_EXPR with guard variables for the
cleanups. Call cp_finish_decomp using RAII if not called with
decomp != NULL otherwise.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P is true, return true instead of emitting
actual code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
after cp_finish_decl.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call. Don't call cp_finish_decomp after
cp_finish_decl.
(cp_finish_omp_range_for): Don't call cp_finish_decomp after
cp_finish_decl.
* pt.cc (tsubst_stmt): Likewise.
* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.
Fortran: Add OpenMP 'interop' directive parsing support
Parse OpenMP's 'interop' directive but stop with a 'sorry, unimplemented'
after resolving.
Additionally, it moves some clause dumping away from the end directive as
that lead to 'nowait' not being printed when it should as some cases were
missed.
Richard Biener [Fri, 29 Sep 2023 10:54:17 +0000 (12:54 +0200)]
Handle non-grouped stores as single-lane SLP
The following enables single-lane loop SLP discovery for non-grouped stores
and adjusts vectorizable_store to properly handle those.
For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop,
not running into the "not falling back to strided accesses" bail-out.
I have not investigated in detail.
There is a set of i386 target assembler test FAILs,
gcc.target/i386/pr88531-2[bc].c in particular fail because the
target cannot identify SLP emulated gathers, see another mail from me.
Others need adjustment, I've adjusted one with this patch only.
In particular there are gcc.target/i386/cond_op_fma_*-1.c FAILs
that are because we no longer fold a VEC_COND_EXPR during the
region value-numbering we do after vectorization since we
code-generate a { 0.0, ... } constant in the VEC_COND_EXPR now
instead of having a separate statement which gets forwarded
and then triggers folding. This leads to sligtly different
code generation. The solution is probably to use gimple_build
when building stmts or, in this case, directly emit .COND_FMA
instead of .FMA and a VEC_COND_EXPR.
gcc.dg/vect/slp-19a.c mixes contiguous 8-lane SLP with a single
lane contiguous store from one lane of the 8-lane load and we
expect to use load-lanes for this reason but the heuristic for
forcing single-lane rediscovery as implemented doesn't trigger
here as it treats both SLP instances separately. FAILs on RISC-V
gcc.dg/vect/slp-19c.c shows we fail to implement an interleaving
scheme for group_size 12 (by extension using the group_size 3
scheme to reduce to 4 lanes and then continue with a pow2 scheme
would work); we are also not considering load-lanes because of
the above reason, but aarch64 cannot do ld12. FAILs on AARCH64
(load requires three vectors) and x86_64.
gcc.dg/vect/slp-19c.c FAILs with variable-length vectors because
of "SLP induction not supported for variable-length vectors".
gcc.target/aarch64/pr110449.c will FAIL because the (contested)
optimization in r14-2367-g224fd59b2dc8a5 was only applied to
loop-vect but not SLP vect. I'll leave it to target maintainers
to either XFAIL (the optimization is bad) or remove the test.
* tree-vect-slp.cc (vect_analyze_slp): Perform single-lane
loop SLP discovery for non-grouped stores. Move check on the root
for re-doing SLP analysis with a single lane for load/store-lanes
earlier and make sure we are dealing with a grouped access.
* tree-vect-stmts.cc (vectorizable_store): Always set
vec_num for SLP.
* gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP.
* gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-19a.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-4-big-array.c: Likewise.
* gcc.dg/vect/slp-4.c: Likewise.
* gcc.dg/vect/slp-5.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-37.c: Likewise.
* gcc.dg/vect/fast-math-vect-call-2.c: Likewise.
* gcc.dg/vect/slp-26.c: RISC-V can now SLP two instances.
* gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of
initialization loop.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL. SLP can handle
inner loop inductions with multiple vector stmt copies.
* gfortran.dg/vect/vect-8.f90: Adjust expected number of
vectorized loops.
* gcc.target/i386/vectorize1.c: Adjust what we scan for.
Richard Biener [Thu, 5 Sep 2024 08:46:58 +0000 (10:46 +0200)]
tree-optimization/116609 - SLP live lane vectorization with partial vectors
The following implements the simple case of single-lane SLP when
using partial vectors which can use the VEC_EXTRACT_LAST code
generation without changes. I'll keep the PR open for further
enhancements.
This avoids FAILs of gcc.target/aarch64/sve/live_1.c when using
single-lane SLP for non-grouped stores.
PR tree-optimization/116609
* tree-vect-loop.cc (vectorizable_live_operation_1): Support
partial vectors for single-lane SLP.