Jakub Jelinek [Mon, 15 Apr 2024 08:25:22 +0000 (10:25 +0200)]
attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]
The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
assumes that all decls must have types.
I think it is better in something as unimportant as diag_attr_exclusions
to be more robust, if there is no type, it can just diagnose exclusions
on the DECL_ATTRIBUTES, like for types it only diagnoses it on
TYPE_ATTRIBUTES.
2024-04-15 Jakub Jelinek <jakub@redhat.com>
PR c++/114634
* attribs.c (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
decls with NULL TREE_TYPE.
Jakub Jelinek [Fri, 12 Apr 2024 18:53:10 +0000 (20:53 +0200)]
c++: Fix bogus warnings about ignored annotations [PR114691]
The middle-end warns about the ANNOTATE_EXPR added for while/for loops
if they declare a var inside of the loop condition.
This is because the assumption is that ANNOTATE_EXPR argument is used
immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond
wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer
holds.
The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the
ANNOTATE_EXPR argument if any.
2024-04-12 Jakub Jelinek <jakub@redhat.com>
PR c++/114691
* semantics.c (simplify_loop_decl_cond): Use cp_build_unary_op with
TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than
ANNOTATE_EXPR itself.
Jakub Jelinek [Thu, 11 Apr 2024 09:12:11 +0000 (11:12 +0200)]
asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector* [PR110027]
On Tue, Mar 26, 2024 at 02:08:02PM +0800, liuhongt wrote:
> > > So, try to add some other variable with larger size and smaller alignment
> > > to the frame (and make sure it isn't optimized away).
> > >
> > > alignb above is the alignment of the first partition's var, if
> > > align_frame_offset really needs to depend on the var alignment, it probably
> > > should be the maximum alignment of all the vars with alignment
> > > alignb * BITS_PER_UNIT <=3D MAX_SUPPORTED_STACK_ALIGNMENT
> > >
>
> In asan_emit_stack_protection, when it allocated fake stack, it assume
> bottom of stack is also aligned to alignb. And the place violated this
> is the first var partition. which is 32 bytes offsets, it should be
> BIGGEST_ALIGNMENT / BITS_PER_UNIT.
> So I think we need to use MAX (BIGGEST_ALIGNMENT /
> BITS_PER_UNIT, ASAN_RED_ZONE_SIZE) for the first var partition.
Your first patch aligned offsets[0] to maximum of alignb and
ASAN_RED_ZONE_SIZE. But as I wrote in the reply to that mail, alignb there
is the alignment of just a single variable which is the first one to appear
in the sorted list and is placed in the highest spot in the stack frame.
That is not necessarily the largest alignment, the sorting ensures that it
is a variable with the largest size in the frame (and only if several of
them have equal size, largest alignment from the same sized ones). Your
second patch used maximum of BIGGEST_ALIGNMENT / BITS_PER_UNIT and
ASAN_RED_ZONE_SIZE. That doesn't change anything at all when using
-mno-avx512f - offsets[0] is still just 32-byte aligned in that case
relative to top of frame, just changes the -mavx512f case to be 64-byte
aligned offsets[0] (aka offsets[0] is then either 0 or -64 instead of either
0 or -32). That will not help if any variable in the frame needs 128-byte,
256-byte, 512-byte ... 4096-byte alignment. If you want to fix the bug in
the spot you've touched, you'd need to walk all the
stack_vars[stack_vars_sorted[si2]] for si2 [si + 1, n - 1] and for those
where the loop would do anything (i.e.
stack_vars[i2].representative == i2
&& TREE_CODE (decl2) == SSA_NAME
? SA.partition_to_pseudo[var_to_partition (SA.map, decl2)] == NULL_RTX
: DECL_RTL (decl2) == pc_rtx
and the pred applies (but that means also walking the earlier ones!
because with -fstack-protector* the vars can be processed in several calls) and
alignb2 * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT
and compute maximum of those alignments.
That maximum is already computed,
data->asan_alignb = MAX (data->asan_alignb, alignb);
computes that, but you get the final result only after you do all the
expand_stack_vars calls. You'd need to compute it before.
Though, that change would be still in the wrong place.
The thing is, it would be a waste of the precious stack space when it isn't
needed at all (e.g. when asan will not at compile time do the use after
return checking, or if it won't do it at runtime, or even if it will do at
runtime it will waste the space on the stack).
The following patch fixes it solely for the __asan_stack_malloc_N
allocations, doesn't enlarge unnecessarily further the actual stack frame.
Because asan is only supported on FRAME_GROWS_DOWNWARD architectures
(mips, rs6000 and xtensa are conditional FRAME_GROWS_DOWNWARD arches, which
for -fsanitize=address or -fstack-protector* use FRAME_GROWS_DOWNWARD 1,
otherwise 0, others supporting asan always just use 1), the assumption for
the dynamic stack realignment is that the top of the stack frame (aka offset
0) is aligned to alignb passed to the function (which is the maximum of alignb
of all the vars in the frame). As checked by the assertion in the patch,
offsets[0] is 0 most of the time and so that assumption is correct, the only
case when it is not 0 is if -fstack-protector* is on together with
-fsanitize=address and cfgexpand.cc (create_stack_guard) created a stack
guard. That is the only variable which is allocated in the stack frame
right away, for all others with -fsanitize=address defer_stack_allocation
(or -fstack-protector*) returns true and so they aren't allocated
immediately but handled during the frame layout phases. So, the original
frame_offset of 0 is changed because of the stack guard to
-pointer_size_in_bytes and later at the
if (data->asan_vec.is_empty ())
{
align_frame_offset (ASAN_RED_ZONE_SIZE);
prev_offset = frame_offset.to_constant ();
}
to -ASAN_RED_ZONE_SIZE. The asan_emit_stack_protection code wasn't
taking this into account though, so essentially assumed in the
__asan_stack_malloc_N allocated memory it needs to align it such that
pointer corresponding to offsets[0] is alignb aligned. But that isn't
correct if alignb > ASAN_RED_ZONE_SIZE, in that case it needs to ensure that
pointer corresponding to frame offset 0 is alignb aligned.
The following patch fixes that. Unlike the previous case where
we knew that asan_frame_size + base_align_bias falls into the same bucket
as asan_frame_size, this isn't in some cases true anymore, so the patch
recomputes which bucket to use and if going to bucket 11 (because there is
no __asan_stack_malloc_11 function in the library) disables the after return
sanitization.
2024-04-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/110027
* asan.c (asan_emit_stack_protection): Assert offsets[0] is
zero if there is no stack protect guard, otherwise
-ASAN_RED_ZONE_SIZE. If alignb > ASAN_RED_ZONE_SIZE and there is
stack pointer guard, take the ASAN_RED_ZONE_SIZE bytes allocated at
the top of the stack into account when computing base_align_bias.
Recompute use_after_return_class from asan_frame_size + base_align_bias
and set to -1 if that would overflow to 11.
Jakub Jelinek [Fri, 5 Apr 2024 12:56:14 +0000 (14:56 +0200)]
vect: Don't clear base_misaligned in update_epilogue_loop_vinfo [PR114566]
The following testcase is miscompiled, because in the vectorized
epilogue the vectorizer assumes it can use aligned loads/stores
(if the base decl gets alignment increased), but it actually doesn't
increase that.
This is because r10-4203-g97c1460367 added the hunk following
patch removes. The explanation feels reasonable, but actually it
is not true as the testcase proves.
The thing is, we vectorize the main loop with 64-byte vectors
and the corresponding data refs have base_alignment 16 (the
a array has DECL_ALIGN 128) and offset_alignment 32. Now, because
of the offset_alignment 32 rather than 64, we need to use unaligned
loads/stores in the main loop (and ditto in the first load/store
in vectorized epilogue). But the second load/store in the vectorized
epilogue uses only 32-byte vectors and because it is a multiple
of offset_alignment, it checks if we could increase alignment of the
a VAR_DECL, the function returns true, sets base_misaligned = true
and says the access is then aligned.
But when update_epilogue_loop_vinfo clears base_misaligned with the
assumption that the var had to have the alignment increased already,
the update of DECL_ALIGN doesn't happen anymore.
Now, I'd think this base_alignment = false was needed before r10-4030-gd2db7f7901 change was committed where it incorrectly
overwrote DECL_ALIGN even if it was already larger, rather than
just always increasing it. But with that change in, it doesn't
make sense to me anymore.
Note, the testcase is latent on the trunk, but reproduces on the 13
branch.
Jakub Jelinek [Fri, 5 Apr 2024 07:31:28 +0000 (09:31 +0200)]
c++: Fix ICE with weird copy assignment operator [PR114572]
While ctors/dtors don't return anything (undeclared void or this pointer
on arm) and copy assignment operators normally return a reference to *this,
it isn't invalid to return uselessly some class object which might need
destructing, but the OpenMP clause handling code wasn't expecting that.
The following patch fixes that.
2024-04-05 Jakub Jelinek <jakub@redhat.com>
PR c++/114572
* cp-gimplify.c (cxx_omp_clause_apply_fn): Call build_cplus_new
on build_call_a result if it has class type.
Jakub Jelinek [Thu, 4 Apr 2024 08:47:52 +0000 (10:47 +0200)]
fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]
The following testcase is incorrectly rejected. The problem is that
for bit-fields native_encode_initializer expects the corresponding
CONSTRUCTOR elt value must be INTEGER_CST, but that isn't the case
here, it is wrapped into NON_LVALUE_EXPR by maybe_wrap_with_location.
We could STRIP_ANY_LOCATION_WRAPPER as well, but as all we are looking for
is INTEGER_CST inside, just looking through NON_LVALUE_EXPR seems easier.
2024-04-04 Jakub Jelinek <jakub@redhat.com>
PR c++/114537
* fold-const.c (native_encode_initializer): Look through
NON_LVALUE_EXPR if val is INTEGER_CST.
Jakub Jelinek [Wed, 3 Apr 2024 08:02:35 +0000 (10:02 +0200)]
libquadmath: Don't assume the storage for __float128 arguments is aligned [PR114533]
With the register_printf_type/register_printf_modifier/register_printf_specifier
APIs the C library is just told the size of the argument and is provided with
a callback to fetch the argument from va_list using va_arg into C library provided
memory. The C library isn't told what alignment requirement it has, but we were
using direct load of a __float128 value from that memory which assumes
__alignof (__float128) alignment.
The following patch fixes that by using memcpy instead.
I haven't been able to reproduce an actual crash, tried
#include <quadmath.h>
#include <stdlib.h>
#include <stdio.h>
int main ()
{
__float128 r;
int prec = 20;
int width = 46;
char buf[128];
r = 2.0q;
r = sqrtq (r);
int n = quadmath_snprintf (buf, sizeof buf, "%+-#*.20Qe", width, r);
if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: +1.41421356237309504880e+00 */
quadmath_snprintf (buf, sizeof buf, "%Qa", r);
if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: 0x1.6a09e667f3bcc908b2fb1366ea96p+0 */
n = quadmath_snprintf (NULL, 0, "%+-#46.*Qe", prec, r);
if (n > -1)
{
char *str = malloc (n + 1);
if (str)
{
quadmath_snprintf (str, n + 1, "%+-#46.*Qe", prec, r);
printf ("%s\n", str);
/* Prints: +1.41421356237309504880e+00 */
}
free (str);
}
printf ("%+-#*.20Qe\n", width, r);
printf ("%Qa\n", r);
printf ("%+-#46.*Qe\n", prec, r);
printf ("%d %Qe %d %Qe %d %Qe\n", 1, r, 2, r, 3, r);
return 0;
}
In any case, I think memcpy for loading from it is right.
2024-04-03 Simon Chopin <simon.chopin@canonical.com>
Jakub Jelinek <jakub@redhat.com>
PR libquadmath/114533
* printf/printf_fp.c (__quadmath_printf_fp): Use memcpy to copy
__float128 out of args.
* printf/printf_fphex.c (__quadmath_printf_fphex): Likewise.
Jakub Jelinek [Thu, 14 Mar 2024 16:48:30 +0000 (17:48 +0100)]
icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]
AFAIK we have no code in LTO streaming to stream out or in
SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
and let vrp1 and alias analysis after IPA recompute that. There is
just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
we saved and propagated, but that is after streaming in bodies for the
post IPA optimizations.
Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
earlier in many cases (er.g. evrp and early alias analysis but other spots
too), but IPA ICF is ignoring the ranges and points-to details when
comparing the bodies. I think ignoring that is just fine, that is
effectively what we do for LTO where we throw that information away
before the analysis, and not ignoring it could lead to fewer ICF merging
possibilities.
So, the following patch instead verifies that for LTO SSA_NAME_{PTR,RANGE}_INFO
just isn't there on SSA_NAMEs in functions into which other functions have
been ICFed, and for non-LTO throws that information away (which matches the
LTO behavior).
Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
vector (just one of the 2) on successful sem_function::equals on the
sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
leader map to SSA_NAMEs in the other function) and use that vector
to union the ranges in sem_function::merge. I can implement that for
comparison, but wanted to post this first if there is an agreement on
doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
into account. I think we can compare SSA_NAME_RANGE_INFO, but have
no idea how to try to compare points to info. And I think it will result
in less effective ICF for non-LTO vs. LTO unnecessarily.
2024-03-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113907
* ipa-icf.c (sem_item_optimizer::merge_classes): Reset
SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
functions.
Jakub Jelinek [Thu, 14 Mar 2024 13:09:20 +0000 (14:09 +0100)]
aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]
The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap<mode> expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap<mode>_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.
So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.
2024-03-14 Jakub Jelinek <jakub@redhat.com>
PR target/114310
* config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): For
TImode force newval into a register.
Jakub Jelinek [Thu, 7 Mar 2024 09:02:49 +0000 (10:02 +0100)]
bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]
The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
/* If the architecture does not have unconditional branches that
can span all of memory, convert crossing unconditional branches
into indirect jumps. Since adding an indirect jump also adds
a new register usage, update the register usage information as
well. */
if (!HAS_LONG_UNCOND_BRANCH)
fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.
I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.
Jakub Jelinek [Mon, 4 Mar 2024 09:04:19 +0000 (10:04 +0100)]
i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]
The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.
As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.
The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.
2024-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/114184
* config/i386/i386-expand.c (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.
Jakub Jelinek [Thu, 22 Feb 2024 18:32:02 +0000 (19:32 +0100)]
c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc. modes [PR114007]
We aren't able to parse __has_attribute (vendor::attr) (and __has_c_attribute
and __has_cpp_attribute) in strict C < C23 modes. While in -std=gnu* modes
or in -std=c23 there is CPP_SCOPE token, in -std=c* (except for -std=c23)
there are is just a pair of CPP_COLON tokens.
The c-lex.cc hunk adds support for that, but always returns 0 in that case
unlike the GCC 14+ version.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR c/114007
gcc/c-family/
* c-lex.c (c_common_has_attribute): Parse 2 CPP_COLONs with
the first one with COLON_SCOPE flag the same as CPP_SCOPE but
ensure 0 is returned then.
gcc/testsuite/
* gcc.dg/c23-attr-syntax-8.c: New test.
libcpp/
* include/cpplib.h (COLON_SCOPE): Define to PURE_ZERO.
* lex.c (_cpp_lex_direct): When lexing CPP_COLON with another
colon after it, if !CPP_OPTION (pfile, scope) set COLON_SCOPE
flag on the first CPP_COLON token.
The C and C++ FEs when parsing attributes already canonicalize them
(i.e. if they start with __ and end with __ substrings, we remove those).
lookup_attribute already verifies in gcc_assert that the first character
of name is not an underscore, and even lookup_scoped_attribute_spec doesn't
attempt to canonicalize the namespace it is passed. But for some historic
reason it was canonicalizing the name argument, which misbehaves when
an attribute starts with ____ and ends with ____.
I believe it is just wrong to try to canonicalize
lookup_scope_attribute_spec name attribute, it should have been
canonicalized already, in other spots where it is called it is already
canonicalized before.
2024-02-12 Jakub Jelinek <jakub@redhat.com>
PR c++/113674
* attribs.c (extract_attribute_substring): Remove.
(lookup_scoped_attribute_spec): Don't call it.
Jakub Jelinek [Tue, 30 Jan 2024 08:58:05 +0000 (09:58 +0100)]
tree-ssa-strlen: Fix up handle_store [PR113603]
Since r10-2101-gb631bdb3c16e85f35d3 handle_store uses
count_nonzero_bytes{,_addr} which (more recently limited to statements
with the same vuse) can walk earlier statements feeding the rhs
of the store and call get_stridx on it.
Unlike most of the other functions where get_stridx is called first on
rhs and only later on lhs, handle_store calls get_stridx on the lhs before
the count_nonzero_bytes* call and does some si->nonzero_bytes comparison
on it.
Now, strinfo structures are refcounted and it is important not to screw
it up.
What happens on the following testcase is that we call get_strinfo on the
destination idx's base (g), which returns a strinfo at that moment
with refcount of 2, one copy referenced in bb 2 final strinfos, one in bb 3
(the vector of strinfos was unshared from the dominator there because some
other strinfo was added) and finally we process a store in bb 6.
Now, count_nonzero_bytes is called and that sees &g[1] in a PHI and
calls get_stridx on it, which in turn calls get_stridx_plus_constant
because &g + 1 address doesn't have stridx yet. This creates a new
strinfo for it:
si = new_strinfo (ptr, idx, build_int_cst (size_type_node, nonzero_chars),
basesi->full_string_p);
set_strinfo (idx, si);
and the latter call, because it is the first one in bb 6 that needs it,
unshares the stridx_to_strinfo vector (so refcount of the g strinfo becomes
3).
Now, get_stridx_plus_constant needs to chain the new strinfo of &g[1] in
between the related strinfos, so after the g record. Because the strinfo
is now shared between the current bb and 2 other bbs, it needs to
unshare_strinfo it (creating a new strinfo which can be modified as a copy
of the old one, decrementing refcount of the old shared one and setting
refcount of the new one to 1):
if (strinfo *nextsi = get_strinfo (chainsi->next))
{
nextsi = unshare_strinfo (nextsi);
si->next = nextsi->idx;
nextsi->prev = idx;
}
chainsi = unshare_strinfo (chainsi);
if (chainsi->first == 0)
chainsi->first = chainsi->idx;
chainsi->next = idx;
Now, the bug is that the caller of this a couple of frames above,
handle_store, holds on a pointer to this g strinfo (but doesn't know
about the unsharing, so the pointer is to the old strinfo with refcount
of 2), and later needs to update it, so it
si = unshare_strinfo (si);
and modifies some fields in it.
This creates a new strinfo (with refcount of 1 which is stored into
the vector of the current bb) based on the old strinfo for g and
decrements refcount of the old one to 1. So, now we are in inconsistent
state, because the old strinfo for g is referenced in bb 2 and bb 3
vectors, but has just refcount of 1, and then have one strinfo (the one
created by unshare_strinfo (chainsi) in get_stridx_plus_constant) which
has refcount of 1 but isn't referenced from anywhere anymore.
Later on when we free one of the bb 2 or bb 3 vectors (forgot which)
that decrements refcount from 1 to 0 and poisons the strinfo/returns it to
the pool, but then maybe_invalidate when looking at the other bb's pointer
to it ICEs.
The following patch fixes it by calling get_strinfo again, it is guaranteed
to return non-NULL, but could be an unshared copy instead of the originally
fetched shared one.
I believe we only need to do this refetching for the case where get_strinfo
is called on the lhs before get_stridx is called on other operands, because
we should be always modifying (apart from the chaining changes) the strinfo
for the destination of the statements, not other strinfos just consumed in
there.
2024-01-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113603
* tree-ssa-strlen.c (strlen_pass::handle_store): After
count_nonzero_bytes call refetch si using get_strinfo in case it
has been unshared in the meantime.
Jakub Jelinek [Thu, 18 Jan 2024 09:21:12 +0000 (10:21 +0100)]
i386: Add -masm=intel profiling support [PR113122]
x86_function_profiler emits assembly directly into file and only emits
AT&T syntax. The following patch adjusts it to emit MASM syntax
if -masm=intel.
As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.
I've tested using
for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; do
./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
diff -up /tmp/1 /tmp/2; done
that the emitted sequences are identical after assembly.
2024-01-18 Jakub Jelinek <jakub@redhat.com>
PR target/113122
* config/i386/i386.c (x86_function_profiler): Add -masm=intel
support. Add missing space after , in emitted assembly in some
cases. Formatting fixes.
* gcc.target/i386/pr113122-1.c: New test.
* gcc.target/i386/pr113122-2.c: New test.
* gcc.target/i386/pr113122-3.c: New test.
* gcc.target/i386/pr113122-4.c: New test.
Jakub Jelinek [Tue, 16 Jan 2024 10:49:34 +0000 (11:49 +0100)]
cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]
The following patch adds a quick workaround to bugs in VAR_DECL
partitioning.
The problem is that there is no dependency between ADDR_EXPRs of local
decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
(including ivopts integral variants thereof), which can break
add_scope_conflicts discovery of what variables are actually used
in certain region.
E.g. we can have
ivtmp.40_3 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
uses of ivtmp.40_3
...
bitint.6 ={v} {CLOBBER(eos)};
...
ivtmp.28_43 = (unsigned long) &MEM <unsigned long[100]> [(void *)&bitint.6 + 8B];
...
uses of ivtmp.28_43
before VN (such as dom3), which the add_scope_conflicts code identifies as 2
independent uses of bitint.6 variable (which is correct), but then VN
determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
even in the second region; at that point add_scope_conflict thinks the
bitint.6 variable is not used in that region anymore.
The following patch does a simple single def-stmt check for such ADDR_EXPRs
(rather than say trying to do a full propagation of what SSA_NAMEs can
contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.
In addition to this patch I've used the attached one to gather statistics
on the total size of all variable partitions in a function and seems besides
the new testcases nothing is really affected compared to no patch (I've
actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
it looks the same except that it never triggers). The comparison wasn't
perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
some replacement of build directories and /tmp/ccXXXXXX names of LTO to make
it more similar between the two bootstraps/regtests), current_function_name
and the total size of all variable partitions if any, because I didn't
record e.g. the optimization options and so e.g. torture tests which iterate
over options could have different partition sizes even in one compiler when
BITS_PER_WORD, main_input_filename and current_function_name are all equal.
So had to write an awk script to check if the first triple in the second
build appeared in the first one and the quadruple in the second build
appeared in the first one too, otherwise print result and that only
triggered in the new tests.
Also, the cc1plus binary according to objdump -dr is identical between the
two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.
2024-01-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113372
PR middle-end/90348
PR middle-end/110115
PR middle-end/111422
* cfgexpand.c (add_scope_conflicts_2): New function.
(add_scope_conflicts_1): Use it.
* gcc.c-torture/execute/pr90348.c: New test.
* gcc.c-torture/execute/pr110115.c: New test.
* gcc.c-torture/execute/pr111422.c: New test.
Jakub Jelinek [Wed, 10 Jan 2024 12:29:47 +0000 (13:29 +0100)]
libgomp: Fix up FLOCK fallback handling [PR113192]
My earlier change broke Solaris testing, because @FLOCK@ isn't substituted
just into libgomp/Makefile where it worked, but also the
testsuite/libgomp-site-extra.exp file where Make variables aren't present
and can't be substituted.
The following patch instead computes the absolute srcdir path and uses it
for FLOCK.
2024-01-10 Jakub Jelinek <jakub@redhat.com>
PR libgomp/113192
* configure.ac (FLOCK): Use $libgomp_abs_srcdir/testsuite/flock
instead of \$(abs_top_srcdir)/testsuite/flock.
* configure: Regenerated.
The copy attributes is allowed on decls as well as types and even has
checks whether decl (set to *node) is DECL_P or TYPE_P, but for diagnostics
unconditionally uses DECL_SOURCE_LOCATION (decl), which obviously only works
if it applies to a decl.
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR c/113262
* c-attribs.c (handle_copy_attribute): Don't use
DECL_SOURCE_LOCATION (decl) if decl is not DECL_P, use input_location
instead. Formatting fixes.
Andre Vieira [Thu, 6 Jun 2024 15:02:50 +0000 (16:02 +0100)]
arm: Add .type and .size to __gnu_cmse_nonsecure_call [PR115360]
This patch adds missing assembly directives to the CMSE library wrapper to call
functions with attribute cmse_nonsecure_call. Without the .type directive the
linker will fail to produce the correct veneer if a call to this wrapper
function is to far from the wrapper itself. The .size was added for
completeness, though we don't necessarily have a usecase for it.
libgcc/ChangeLog:
PR target/115360
* config/arm/cmse_nonsecure_call.S: Add .type and .size directives.
arm: Zero/Sign extends for CMSE security on Armv8-M.baseline [PR115253]
Properly handle zero and sign extension for Armv8-M.baseline as
Cortex-M23 can have the security extension active.
Currently, there is an internal compiler error on Cortex-M23 for the
epilog processing of sign extension.
This patch addresses the following CVE-2024-0151 for Armv8-M.baseline.
This patch fixes an oversight in the handling of debug instructions
in rtl-ssa. At the moment (and whether this is a good idea or not
remains to be seen), we maintain a linear RPO sequence of definitions
and non-debug uses. If a register is defined more than once, we use
a degenerate phi to reestablish a previous definition where necessary.
However, debug instructions shouldn't of course affect codegen,
so we can't create a new definition just for them. In those situations
we instead hang the debug use off the real definition (meaning that
debug uses do not follow a linear order wrt definitions). Again,
it remains to be seen whether that's a good idea.
The problem in the PR was that we weren't taking this into account
when increasing (or potentially increasing) the live range of an
existing definition. We'd create the phi even if it would only
be used by debug instructions.
The patch goes for the simple but inelegant approach of passing
a bool to say whether the use is a debug use or not. I imagine
this area will need some tweaking based on experience in future.
gcc/
PR rtl-optimization/100303
* rtl-ssa/accesses.cc (function_info::make_use_available): Take a
boolean that indicates whether the use will only be used in
debug instructions. Treat it in the same way that existing
cross-EBB debug references would be handled if so.
(function_info::make_uses_available): Likewise.
* rtl-ssa/functions.h (function_info::make_uses_available): Update
prototype accordingly.
(function_info::make_uses_available): Likewise.
* fwprop.c (try_fwprop_subst): Update call accordingly.
rtl-ssa: Extend m_num_defs to a full unsigned int [PR108086]
insn_info tried to save space by storing the number of
definitions in a 16-bit bitfield. The justification was:
// ... FIRST_PSEUDO_REGISTER + 1
// is the maximum number of accesses to hard registers and memory, and
// MAX_RECOG_OPERANDS is the maximum number of pseudos that can be
// defined by an instruction, so the number of definitions should fit
// easily in 16 bits.
But while that reasoning holds (I think) for real instructions,
it doesn't hold for artificial instructions. I don't think there's
any sensible higher limit we can use, so this patch goes for a full
unsigned int.
gcc/
PR rtl-optimization/108086
* rtl-ssa/insns.h (insn_info): Make m_num_defs a full unsigned int.
Adjust size-related commentary accordingly.
This was another PR caused by the way that
vect_determine_precisions_from_range handles shifts. We tried to
narrow 32768 >> x to a 16-bit shift based on range information for
the inputs and outputs, with vect_recog_over_widening_pattern
(after PR110828) adjusting the shift amount. But this doesn't
work for the case where x is in [16, 31], since then 32-bit
32768 >> x is a well-defined zero, whereas no well-defined
16-bit 32768 >> y will produce 0.
We could perhaps generate x < 16 ? 32768 >> x : 0 instead,
but since vect_determine_precisions_from_range was never really
supposed to rely on fix-ups, it seems better to fix that instead.
The patch also makes the code more selective about which codes
can be narrowed based on input and output ranges. This showed
that vect_truncatable_operation_p was missing cases for
BIT_NOT_EXPR (equivalent to BIT_XOR_EXPR of -1) and NEGATE_EXPR
(equivalent to BIT_NOT_EXPR followed by a PLUS_EXPR of 1).
pr113281-1.c is the original testcase. pr113281-[23].c failed
before the patch due to overly optimistic narrowing. pr113281-[45].c
previously passed and are meant to protect against accidental
optimisation regressions.
gcc/
PR target/113281
* tree-vect-patterns.c (vect_recog_over_widening_pattern): Remove
workaround for right shifts.
(vect_truncatable_operation_p): Handle NEGATE_EXPR and BIT_NOT_EXPR.
(vect_determine_precisions_from_range): Be more selective about
which codes can be narrowed based on their input and output ranges.
For shifts, require at least one more bit of precision than the
maximum shift amount.
create_intersect_range_checks checks whether two access ranges
a and b are alias-free using something equivalent to:
end_a <= start_b || end_b <= start_a
It has two ways of doing this: a "vanilla" way that calculates
the exact exclusive end pointers, and another way that uses the
last inclusive aligned pointers (and changes the comparisons
accordingly). The comment for the latter is:
/* Calculate the minimum alignment shared by all four pointers,
then arrange for this alignment to be subtracted from the
exclusive maximum values to get inclusive maximum values.
This "- min_align" is cumulative with a "+ access_size"
in the calculation of the maximum values. In the best
(and common) case, the two cancel each other out, leaving
us with an inclusive bound based only on seg_len. In the
worst case we're simply adding a smaller number than before.
The problem is that the associated code implicitly assumed that the
access size was a multiple of the pointer alignment, and so the
alignment could be carried over to the exclusive end pointer.
The testcase started failing after g:9fa5b473b5b8e289b6542
because that commit improved the alignment information for
the accesses.
gcc/
PR tree-optimization/115192
* tree-data-ref.c (create_intersect_range_checks): Take the
alignment of the access sizes into account.
gcc/testsuite/
PR tree-optimization/115192
* gcc.dg/vect/pr115192.c: New test.
where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.
Wrap input operand with truncate:SI to make machine modes consistent.
PR target/115297
gcc/ChangeLog:
* config/alpha/alpha.md (<any_divmod:code>si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.
Jonathan Wakely [Wed, 29 May 2024 09:59:48 +0000 (10:59 +0100)]
libstdc++: Replace link to gcc-4.3.2 docs in manual [PR115269]
Link to the docs for GCC trunk instead. For the release branches, the
link should be to the docs for appropriate release branch.
Also replace the incomplete/outdated list of explicit -std options with
a single entry for the -std option.
libstdc++-v3/ChangeLog:
PR libstdc++/115269
* doc/xml/manual/using.xml: Replace link to gcc-4.3.2 docs.
Replace list of -std=... options with a single entry for -std.
* doc/html/manual/using.html: Regenerate.
YunQiang Su [Tue, 28 May 2024 18:28:25 +0000 (02:28 +0800)]
MIPS16: Mark $2/$3 as clobbered if GP is used
PR Target/84790.
The gp init sequence
li $2,%hi(_gp_disp)
addiu $3,$pc,%lo(_gp_disp)
sll $2,16
addu $2,$3
is generated directly in `mips_output_function_prologue`, and does
not appear in the RTL.
So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
so they may be used for cross (local) function call.
Let's mark $2/$3 clobber both:
- Just after the UNSPEC_GP RTL of a function;
- Just after a function call.
Reported-by: Matthias Schiffer <mschiffer@universe-factory.net> Origin-Patch-by: Felix Fietkau <nbd@nbd.name>.
gcc
* config/mips/mips.c(mips16_gp_pseudo_reg): Mark
MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
(mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
Jakub Jelinek [Wed, 22 May 2024 07:12:28 +0000 (09:12 +0200)]
ubsan: Use right address space for MEM_REF created for bool/enum sanitization [PR115172]
The following testcase is miscompiled, because -fsanitize=bool,enum
creates a MEM_REF without propagating there address space qualifiers,
so what should be normally loaded using say %gs:/%fs: segment prefix
isn't. Together with asan it then causes that load to be sanitized.
2024-05-22 Jakub Jelinek <jakub@redhat.com>
PR sanitizer/115172
* ubsan.c (instrument_bool_enum_load): If rhs is not in generic
address space, use qualified version of utype with the right
address space. Formatting fix.
Jason Merrill [Thu, 25 Jan 2024 17:02:07 +0000 (12:02 -0500)]
c++: array of PMF [PR113598]
Here AGGREGATE_TYPE_P includes pointers to member functions, which is not
what we want. Instead we should use class||array, as elsewhere in the
function.
Jason Merrill [Wed, 27 Mar 2024 20:14:01 +0000 (16:14 -0400)]
c++: __is_constructible ref binding [PR100667]
The requirement that a type argument be complete is excessive in the case of
direct reference binding to the same type, which does not rely on any
properties of the type. This is LWG 2939.
PR c++/100667
gcc/cp/ChangeLog:
* semantics.c (same_type_ref_bind_p): New.
(finish_trait_expr): Use it.
Jason Merrill [Tue, 2 Apr 2024 14:52:28 +0000 (10:52 -0400)]
c++: binding reference to comma expr [PR114561]
We represent a reference binding where the referent type is more qualified
by a ck_ref_bind around a ck_qual. We performed the ck_qual and then tried
to undo it with STRIP_NOPS, but that doesn't work if the conversion is
buried in COMPOUND_EXPR. So instead let's avoid performing that fake
conversion in the first place.
PR c++/114561
PR c++/114562
gcc/cp/ChangeLog:
* call.c (convert_like_internal): Avoid adding qualification
conversion in direct reference binding.
gcc/testsuite/ChangeLog:
* g++.dg/conversion/ref10.C: New test.
* g++.dg/conversion/ref11.C: New test.
PR libstdc++/114803
* include/experimental/bits/simd_builtin.h
(_SimdBase2::operator __vector_type_t): There is no __builtin()
function in _SimdWrapper, instead use its conversion operator.
* testsuite/experimental/simd/pr114803_vecbuiltin_cvt.cc: New
test.
PR libstdc++/114750
* include/experimental/bits/simd_builtin.h
(_SimdImplBuiltin::_S_load, _S_store): Fall back to copying
scalars if the memory type cannot be vectorized for the target.
* include/experimental/bits/simd_x86.h (_S_masked_unary):
Cast inputs < 16 bytes to 16 byte vectors before calling the
right subtraction builtin. Before returning, truncate to the
return vector type.
* include/experimental/bits/simd_x86.h (_S_masked_unary): Call
the 4- and 8-byte variants of __builtin_ia32_subp[ds] without
rounding direction argument.
PR libstdc++/109822
* include/experimental/bits/simd_builtin.h (_S_store): Rewrite
to avoid casts to other vector types. Implement store as
succession of power-of-2 sized memcpy to avoid PR90424.
Matthias Kretz [Fri, 2 Jun 2023 11:44:22 +0000 (13:44 +0200)]
libstdc++: Replace use of incorrect non-temporal store
The call to the base implementation sometimes didn't find a matching
signature because the _Abi parameter of _SimdImpl* was "wrong" after
conversion. It has to call into <new ABI tag>::_SimdImpl instead of the
current ABI tag's _SimdImpl. This also reduces the number of possible
template instantiations.
PR libstdc++/110054
* include/experimental/bits/simd_builtin.h (_S_masked_store):
Call into deduced ABI's SimdImpl after conversion.
* include/experimental/bits/simd_x86.h (_S_masked_store_nocvt):
Don't use _mm_maskmoveu_si128. Use the generic fall-back
implementation. Also fix masked stores without SSE2, which
were not doing anything before.
Objective-C, NeXT, v2: Correct a regression in code-gen.
There have been several changes in the ABI of Objective-C which
depend on the OS version targetted. In this case Protocols and
LabelProtocols should be made weak/hidden/extern from macOS 10.7
however there was a mistake in the code causing this to occur
from macOS 10.6. Fixed thus.
gcc/objc/ChangeLog:
* objc-next-runtime-abi-02.c (WEAK_PROTOCOLS_AFTER): New.
(next_runtime_abi_02_protocol_decl): Use WEAK_PROTOCOLS_AFTER
to determine this ABI change.
(build_v2_protocol_list_address_table): Likewise.
Jakub Jelinek [Thu, 9 May 2024 09:18:21 +0000 (11:18 +0200)]
testsuite: Fix up vector-subaccess-1.C test for ia32 [PR89224]
The test FAILs on i686-linux due to
.../gcc/testsuite/g++.dg/torture/vector-subaccess-1.C:16:6: warning: SSE vector argument without SSE enabled changes the ABI [-Wpsabi]
excess warnings.
This fixes it by adding -Wno-psabi, like commonly done in other tests.
2024-05-09 Jakub Jelinek <jakub@redhat.com>
PR c++/89224
* g++.dg/torture/vector-subaccess-1.C: Add -Wno-psabi as additional
options.
Andrew Pinski [Sun, 24 Sep 2023 04:53:09 +0000 (21:53 -0700)]
Fix PR 110386: backprop vs ABSU_EXPR
The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types,
removing ABSU_EXPR here is correct. We don't handle
nop conversions so this does cause any missed optimizations either.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the comparison
which was wrong.
Committed to GCC 13 branch after bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/111331
* tree-ssa-phiopt.c (minmax_replacement):
Fix the LE/GE comparison for the
`(a CMP CST1) ? max<a,CST2> : a` optimization.
gcc/testsuite/ChangeLog:
PR tree-optimization/111331
* gcc.c-torture/execute/pr111331-1.c: New test.
* gcc.c-torture/execute/pr111331-2.c: New test.
* gcc.c-torture/execute/pr111331-3.c: New test.
Andrew Pinski [Sun, 10 Mar 2024 22:17:09 +0000 (22:17 +0000)]
Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]
The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.
Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR middle-end/95351
gcc/ChangeLog:
* fold-const.c (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.
Andrew Pinski [Tue, 20 Feb 2024 21:38:28 +0000 (13:38 -0800)]
c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]
After r7-987-gf17a223de829cb, the access for the elements of a vector type would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would not have const on it.
This was due to a missing build_qualified_type for the inner type of the vector when building the array type.
We need to add back the call to build_qualified_type and now the access has the correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.
Note we correctly now reject the testcase gcc.dg/pr83415.c which was incorrectly accepted after r7-987-gf17a223de829cb.
Built and tested for aarch64-linux-gnu.
PR c++/89224
gcc/c-family/ChangeLog:
* c-common.c (convert_vector_to_array_for_subscript): Call build_qualified_type
for the inner type.
gcc/cp/ChangeLog:
* constexpr.c (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.
gcc/testsuite/ChangeLog:
* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.
Peter Bergner [Thu, 2 May 2024 23:07:05 +0000 (18:07 -0500)]
rs6000: Add OPTION_MASK_POWER8 [PR101865]
The bug in PR101865 is the _ARCH_PWR8 predefine macro is conditional upon
TARGET_DIRECT_MOVE, which can be false for some -mcpu=power8 compiles if the
-mno-altivec or -mno-vsx options are used. The solution here is to create
a new OPTION_MASK_POWER8 mask that is true for -mcpu=power8, regardless of
Altivec or VSX enablement.
Unfortunately, the only way to create an OPTION_MASK_* mask is to create
a new option, which we have done here, but marked it as WarnRemoved since
we do not want users using it. For stage1, we will look into how we can
create ISA mask flags for use in the compiler without the need for explicit
options.
2024-04-12 Will Schmidt <will_schmidt@linux.ibm.com>
Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/101865
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Use
OPTION_MASK_POWER8.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Add OPTION_MASK_POWER8.
(ISA_2_7_MASKS_SERVER): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Update
comment. Use OPTION_MASK_POWER8 and TARGET_POWER8.
* config/rs6000/rs6000.h (TARGET_SYNC_HI_QI): Use TARGET_POWER8.
* config/rs6000/rs6000.md (define_attr "isa"): Add p8.
(define_attr "enabled"): Handle it.
(define_insn "prefetch"): Use TARGET_POWER8.
* config/rs6000/rs6000.opt (mpower8-internal): New.
gcc/testsuite/
PR target/101865
* gcc.target/powerpc/predefine-p7-novsx.c: New test.
* gcc.target/powerpc/predefine-p8-noaltivec-novsx.c: New test.
* gcc.target/powerpc/predefine-p8-noaltivec.c: New test.
* gcc.target/powerpc/predefine-p8-novsx.c: New test.
* gcc.target/powerpc/predefine-p8-pragma-vsx.c: New test.
* gcc.target/powerpc/predefine-p9-novsx.c: New test.
Peter Bergner [Tue, 9 Apr 2024 20:24:39 +0000 (15:24 -0500)]
rs6000: Replace OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR [PR101865]
This is a cleanup patch in preparation to fixing the real bug in PR101865.
TARGET_DIRECT_MOVE is redundant with TARGET_P8_VECTOR, so alias it to that.
Also replace all usages of OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR
and delete the now dead mask.
2024-04-09 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/101865
* config/rs6000/rs6000.h (TARGET_DIRECT_MOVE): Define.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Replace
OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR. Delete redundant
OPTION_MASK_DIRECT_MOVE usage. Delete TARGET_DIRECT_MOVE dead code.
(rs6000_opt_masks): Neuter the "direct-move" option.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Replace
OPTION_MASK_DIRECT_MOVE with OPTION_MASK_P8_VECTOR. Delete useless
comment.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Delete
OPTION_MASK_DIRECT_MOVE.
(OTHER_P8_VECTOR_MASKS): Likewise.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.opt (mdirect-move): Remove Mask and Var.
I caused a regression with commit r10-908 by adding a constraint to the
non-explicit allocator-extended default constructor, but seemingly
forgot to add an explicit overload with the corresponding constraint.
Iain Sandoe [Sat, 13 Jan 2024 17:20:47 +0000 (17:20 +0000)]
jit, Darwin: Implement library exports list.
Currently, we have no exports list for libgccjit, which means that
all symbols are exported, including those from libstdc++ which is
linked statically into the lib. This causes failures when the
shared libstdc++ is used but some c++ symbols are satisfied from
libgccjit.
This implements an export file for Darwin (which is currently
manually created by cross-checking libgccjit.map). Ideally we'd
script this, at some point. Update libtool current and age to
reflect the current ABI version (we are not bumping the SO name
at this stage).
This fixes a number of new failures in jit testing.
gcc/jit/ChangeLog:
* Make-lang.in: Implement exports list, and use a shared
libgcc.
* libgccjit.exports: New file.