Carl Love [Thu, 27 Feb 2020 00:12:17 +0000 (18:12 -0600)]
PR target/91276 - Doc typos in __builtin_crypto_vpmsum*
gcc/ChangeLog:
2020-02-26 Carl Love <cel@us.ibm.com>
PR target/91276
* doc/extend.texi (PowerPC AltiVec Built-in Functions): The
builtin-function name __builtin_crypto_vpmsumb is only for the
vector unsigned short arguments. It is also listed as the name of
the built-in for arguments vector unsigned short,
vector unsigned int and vector unsigned long long built-ins. The
name of the builtins for these arguments should be:
__builtin_crypto_vpmsumh, __builtin_crypto_vpmsumw and
__builtin_crypto_vpmsumd respectively.
Jonathan Wakely [Wed, 26 Feb 2020 16:09:52 +0000 (16:09 +0000)]
libstdc++: Fix undefined behaviour in random dist serialization (PR93205)
The deserialization functions for random number distributions fail to
check the stream state before using the extracted values. In some cases
this leads to using indeterminate values to resize a vector, and then
filling that vector with indeterminate values.
No values that affect control flow should be used without checking that a
good value was read from the stream.
Additionally, where reasonable to do so, defer modifying any state in
the distribution until all values have been successfully read, to avoid
modifying some of the distribution's parameters and leaving others
unchanged.
Backport from mainline
2020-01-09 Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/93205
* include/bits/random.h (operator>>): Check stream operation succeeds.
* include/bits/random.tcc: (operator>>): Likewise.
(__extract_params): New function to fill a vector from a stream.
* testsuite/26_numerics/random/pr60037-neg.cc: Adjust dg-error line.
Jonathan Wakely [Wed, 26 Feb 2020 15:32:34 +0000 (15:32 +0000)]
libstdc++: Replace glibc-specific check for clock_gettime (PR 93325)
It's wrong to assume that clock_gettime is unavailable on any *-*-linux*
target that doesn't have glibc 2.17 or later. Use a generic test instead
of using __GLIBC_PREREQ. Only do that test when is_hosted=yes so that we
don't get an error for cross targets without a working linker.
This ensures that C library's clock_gettime will be used on non-glibc
targets, instead of an incorrect syscall to SYS_clock_gettime.
Backport from mainline
2020-01-28 Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/93325
* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Use AC_SEARCH_LIBS for
clock_gettime instead of explicit glibc version check.
* configure: Regenerate.
Jonathan Wakely [Wed, 26 Feb 2020 15:04:53 +0000 (15:04 +0000)]
libstdc++: Fix regressions in unique_ptr::swap (PR 93562)
The requirements for this function are only that the deleter is
swappable, but we incorrectly require that the element type is complete
and that the deleter can be swapped using std::swap (which requires it
to be move cosntructible and move assignable).
The fix is to add __uniq_ptr_impl::swap which swaps the pointer and
deleter individually, instead of using the generic std::swap on the
tuple containing them.
PR libstdc++/93562
* include/bits/unique_ptr.h (__uniq_ptr_impl::swap): Define.
(unique_ptr::swap, unique_ptr<T[], D>::swap): Call it.
* testsuite/20_util/unique_ptr/modifiers/93562.cc: New test.
Jonathan Wakely [Wed, 26 Feb 2020 09:43:59 +0000 (09:43 +0000)]
PR libstdc++/78552 only construct std::locale for C locale once
Backport from mainline
2019-10-09 Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/78552
* src/c++98/locale_init.cc (locale::classic()): Do not construct a new
locale object for every call.
(locale::_S_initialize_once()): Construct C locale here.
Jonathan Wakely [Fri, 24 Jan 2020 11:13:55 +0000 (11:13 +0000)]
libstdc++: Simplify makefile rule for largefile-config.h (PR91947)
The previous rule could leave an incomplete file if the build was
interrupted, which would then not be remade if make was run again.
This makes the rule more robust by writing to a temporary file and only
moving it into place as the final step. It also simplifies the rule so
that only the essential macro definitions are written to the file, not
the explanatory comments and commented out #undef lines.
Also, the macro for enabling LFS on Mac OS X 10.5 is now set
unconditionally, which is a bug fix from upstream autoconf.
Backport from mainline
2020-01-23 Jonathan Wakely <jwakely@redhat.com>
Jonathan Wakely [Thu, 9 Jan 2020 13:38:43 +0000 (13:38 +0000)]
Build filesystem library with large file support
Enable AC_SYS_LARGEFILE to set the macros needed for large file APIs to
be used by default. We do not want to define those macros in the
public headers that users include. The values of the macros are copied
to a separate file that is only included by the filesystem sources
during the build, and then the macros in <bits/c++config.h> are renamed
so that they don't have any effect in user code including our headers.
Also use larger type for result of filesystem::file_size to avoid
truncation of large values on 32-bit systems (PR 91947).
Backport from mainlne
2019-10-04 Jonathan Wakely <jwakely@redhat.com>
PR libstdc++/81091
PR libstdc++/91947
* configure.ac: Use AC_SYS_LARGEFILE to enable 64-bit file APIs.
* config.h.in: Regenerate:
* configure: Regenerate:
* include/Makefile.am (${host_builddir}/largefile-config.h): New
target to generate config header for filesystem library.
(${host_builddir}/c++config.h): Rename macros for large file support.
* include/Makefile.in: Regenerate.
* src/c++17/fs_dir.cc: Include new config header.
* src/c++17/fs_ops.cc: Likewise.
(filesystem::file_size): Use uintmax_t for size.
* src/filesystem/dir.cc: Include new config header.
* src/filesystem/ops.cc: Likewise.
(experimental::filesystem::file_size): Use uintmax_t for size.
The following testcase is miscompiled in 8+.
The problem is that check_no_overlap has a special case for INTEGER_CST
marked stores (i.e. stores of constants), if both all currenly merged stores
and the one under consideration for merging with them are marked that way,
it anticipates that other INTEGER_CST marked stores that overlap with those
and precede those (have smaller info->order) could be merged with those and
doesn't punt for them.
In PR86844 and PR87859 fixes I've then added quite large code that is
performed after check_no_overlap and tries to find out if we need and can
merge further INTEGER_CST marked stores, or need to punt.
Unfortunately, that code is there only in the overlapping case code and
the testcase below shows that we really need it even in the adjacent store
case. After sort_by_bitpos we have:
bitpos width order rhs_code
96 32 3 INTEGER_CST
128 32 1 INTEGER_CST
128 128 2 INTEGER_CST
192 32 0 MEM_REF
Because of the missing PR86844/PR87859-ish code in the adjacent store
case, we merge the adjacent (memory wise) stores 96/32/3 and 128/32/1,
and then we consider the 128-bit store which is in program-order in between
them, but in this case we punt, because the merging would extend the
merged store region from bitpos 96 and 64-bits to bitpos 96 and 160-bits
and that has an overlap with an incompatible store (the MEM_REF one).
The problem is that we can't really punt this way, because the 128-bit
store is in between those two we've merged already, so either we manage
to merge even that one together with the others, or would need to avoid
already merging the 96/32/3 and 128/32/1 stores together.
Now, rather than copying around the PR86844/PR87859 code to the other spot,
we can actually just use the overlapping code, merge_overlapping is really
a superset of merge_into, so that is what the patch does. If doing
adjacent store merge for rhs_code other than INTEGER_CST, I believe the
current code is already fine, check_no_overlap in that case doesn't make
the exception and will punt if there is some earlier (smaller order)
non-mergeable overlapping store. There is just one case that could be
problematic, if the merged_store has BIT_INSERT_EXPRs in them and the
new store is a constant store (INTEGER_CST rhs_code), then check_no_overlap
would do the exception and still would allow the special case. But we
really shouldn't have the special case in that case, so this patch also
changes check_no_overlap to just have a bool whether we should have the
special case or not.
Note, as I said in the PR, for GCC11 we could consider performing some kind
of cheap DSE during the store merging (perhaps guarded with flag_tree_dse).
And another thing to consider is only consider as problematic non-mergeable
stores that not only have order smaller than last_order as currently, but
also have order larger than first_order, as in this testcase if we actually
ignored (not merged with anything at all) the 192/32/0 store, because it is
not in between the other stores we'd merge, it would be fine to merge the
other 3 stores, though of course the testcase can be easily adjusted by
putting the 192/32 store after the 128/32 store and then this patch would be
still needed. Though, I think I'd need more time thinking this over.
2020-02-26 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/93820
* gimple-ssa-store-merging.c (check_no_overlap): Change RHS_CODE
argument to ALL_INTEGER_CST_P boolean.
(imm_store_chain_info::try_coalesce_bswap): Adjust caller.
(imm_store_chain_info::coalesce_immediate_stores): Likewise. Handle
adjacent INTEGER_CST store into merged_store->only_constants like
overlapping one.
Jakub Jelinek [Wed, 26 Feb 2020 08:02:05 +0000 (09:02 +0100)]
c++: Fix rejects-valid bug in cxx_eval_outermost_constant_expr [PR93905]
The following testcase is rejected in 8.3, but was accepted in 8.2 and
is in 9.x. This started with my PR87934
* constexpr.c (cxx_eval_constant_expression) <case CONSTRUCTOR>: Do
re-process TREE_CONSTANT CONSTRUCTORs if they aren't reduced constant
expressions.
backport, where the NSDMI CONSTRUCTOR that contains CONST_DECLs is now
constexpr evaluated so that it doesn't contain them. The difference from
9.x is that 9.x doesn't call get_target_expr if we got a CONSTRUCTOR for a
class type for something that has been originally a CONSTRUCTOR too.
This patch cherry-picks just that hunk of the r9-3835 change.
2020-02-26 Jakub Jelinek <jakub@redhat.com>
PR c++/93905
Backported from mainline
2018-11-04 Jason Merrill <jason@redhat.com>
* constexpr.c (cxx_eval_outermost_constant_expr): Don't wrap a
CONSTRUCTOR if one was passed in.
Marek Polacek [Wed, 26 Feb 2020 05:33:52 +0000 (00:33 -0500)]
PR c++/90998 - ICE with copy elision in init by ctor and -Wconversion.
After r269667 which introduced joust_maybe_elide_copy, in C++17 we can elide
a constructor if it uses a conversion function that returns a prvalue, and
use the conversion function in its stead.
This eliding means that if we have a candidate that previously didn't have
->second_conv, it can have it after the elision. This confused the
-Wconversion warning because it was assuming that if cand1->second_conv is
non-null, so is cand2->second_conv. Here cand1->second_conv was non-null
but cand2->second_conv remained null, so it crashed in compare_ics.
I checked with clang that both compilers call A::operator B() in C++17 and
B::B(A const &) otherwise.
gcc/cp/ChangeLog
2020-02-26 Marek Polacek <polacek@redhat.com>
PR c++/90998 - ICE with copy elision in init by ctor and -Wconversion.
* call.c (joust): Don't attempt to warn if ->second_conv is null.
gcc/testsuite/ChangeLog
2020-02-26 Marek Polacek <polacek@redhat.com>
PR c++/90998 - ICE with copy elision in init by ctor and -Wconversion.
* g++.dg/cpp0x/overload-conv-4.C: New test.
Jason Merrill [Wed, 26 Feb 2020 05:33:52 +0000 (00:33 -0500)]
PR c++/86521 - C++17 copy elision in initialization by constructor.
This is an overlooked case in C++17 mandatory copy elision: We want overload
resolution to reflect that initializing an object from a prvalue does not
involve a copy or move constructor even when [over.match.ctor] says that
only constructors are candidates. Here I implement that by looking through
the copy/move constructor in joust.
gcc/cp/ChangeLog
2020-02-26 Jason Merrill <jason@redhat.com>
PR c++/86521 - C++17 copy elision in initialization by constructor.
* call.c (joust_maybe_elide_copy): New.
(joust): Call it.
Jason Merrill [Wed, 26 Feb 2020 05:33:52 +0000 (00:33 -0500)]
c++: Allow template rvalue-ref conv to bind to lvalue ref.
When I implemented the [over.match.ref] rule that a reference conversion
function needs to match l/rvalue of the target reference type it changed our
handling of this testcase. It seems to me that our current behavior is what
the standard says, but it doesn't seem desirable, and all the other
compilers have our old behavior. So let's limit the change to non-templates
until there's some clarification from the committee.
gcc/cp/ChangeLog
2020-02-26 Jason Merrill <jason@redhat.com>
PR c++/90546
* call.c (build_user_type_conversion_1): Allow a template conversion
returning an rvalue reference to bind directly to an lvalue.
Jason Merrill [Wed, 26 Feb 2020 05:33:52 +0000 (00:33 -0500)]
PR c++/86521 - wrong overload resolution with ref-qualifiers.
Here we were wrongly treating binding a const lvalue ref to an xvalue as
direct binding, which is wrong under [dcl.init.ref] and [over.match.ref].
gcc/cp/ChangeLog
2020-02-26 Jason Merrill <jason@redhat.com>
PR c++/86521 - wrong overload resolution with ref-qualifiers.
* call.c (build_user_type_conversion_1): Don't use a conversion to a
reference of the wrong rvalueness for direct binding.
Jason Merrill [Tue, 25 Feb 2020 18:37:18 +0000 (13:37 -0500)]
PR c++/89831 - error with qualified-id in const member function.
Since the fix for 15272 we were remembering the wrong function to use at
instantiation time, because the type of the SCOPE_REF didn't reflect the
cv-quals of 'this'. Conveniently, we can fix this by simplifying the code.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
PR c++/89831 - error with qualified-id in const member function.
* semantics.c (finish_non_static_data_member): Use object cv-quals
in scoped case, too.
Jason Merrill [Tue, 25 Feb 2020 18:37:18 +0000 (13:37 -0500)]
PR c++/88380 - wrong-code with flexible array and NSDMI.
Here 'skipped' was set to -1 to force an explicit initializer for 'uninit'
before the initializer for 'initialized', and so we also tried to emit an
explicit initializer for the flexible array, for which build_zero_init
returns error_mark_node. We should ignore flexarrays even when
skipped < 0.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
PR c++/88380 - wrong-code with flexible array and NSDMI.
* typeck2.c (process_init_constructor_record): Skip flexarrays.
The standard says that in a generic lambda we should speculatively capture
'this' if we see a call to an overload set that contains a non-static member
function, but it seems wrong to reject the program if we can't capture,
since it might not actually be needed.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
Jason Merrill [Tue, 25 Feb 2020 18:37:18 +0000 (13:37 -0500)]
PR c++/87554 - ICE with extern template and reference member.
The removed code ended up setting DECL_INITIAL to the INIT_EXPR returned by
split_nonconstant_init, which makes no sense. This code was added back in
1996, so any rationale is long lost.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
PR c++/87554 - ICE with extern template and reference member.
* decl.c (cp_finish_decl): Don't set DECL_INITIAL of external vars.
Alexandre Oliva [Wed, 26 Feb 2020 02:29:03 +0000 (21:29 -0500)]
c++: tsubst friend tpl ctxt before looking it up for dupes [PR86747]
When a member template is redeclared as a friend, we enter the context
of the member before looking it up, and then we check that the decls
are compatible. However, when the member template references template
types of the enclosing context, say an enclosing template class, the
compare fails because the friend decl is already tsubsted, whereas the
looked up name isn't.
The problem is that the enclosing context is taken from the friend
declaration before tsubsting it, so we look up in the context of the
generic template instead of that of the tsubsted one we're
specializing. The solution is to tsubst the enclosing context when
it's a non-namespace scope.
Jason Merrill [Wed, 26 Feb 2020 02:29:03 +0000 (21:29 -0500)]
PR c++/86429 - constexpr variable in lambda.
When we refer to a captured variable from a constant-expression context
inside a lambda, the closure (like any function parameter) is not constant
because we aren't in a call, so we don't have an argument. So the capture
is non-constant. But if the captured variable is constant, we might be able
to use it directly in constexpr evaluation.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
PR c++/86429 - constexpr variable in lambda.
PR c++/82643
PR c++/87327
* constexpr.c (cxx_eval_constant_expression): In a lambda function,
try evaluating the captured variable directly.
Jason Merrill [Wed, 26 Feb 2020 02:29:03 +0000 (21:29 -0500)]
PR c++/87748 - substitution failure error with decltype.
This issue is similar to PR 87480; in both cases we were doing non-dependent
substitution with processing_template_decl set, leading to member access
expressions seeming still instantiation-dependent, and therefore decltype
not being simplified to its actual type. And as in that PR, the fix is to
clear processing_template_decl while substituting a default template
argument.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
Jason Merrill [Wed, 26 Feb 2020 02:29:03 +0000 (21:29 -0500)]
PR c++/87480 - decltype of member access in default template arg
The issue here is that declval<T>().d is considered instantiation-dependent
within a template, as the access to 'd' might depend on the particular
specialization. But when we're deducing template arguments for a call, we
know that the call and the arguments are non-dependent, so we can do the
substitution as though we aren't in a template. Which strictly speaking we
aren't, since the default argument is considered a separate definition.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
PR c++/87480 - decltype of member access in default template arg
* pt.c (type_unification_real): Accept a dependent result in
template context.
Jason Merrill [Wed, 26 Feb 2020 02:29:03 +0000 (21:29 -0500)]
PR c++/88394 - ICE with VLA init-capture.
We mostly use is_normal_capture_proxy to decide whether or not to use
DECL_CAPTURED_VARIABLE; we could just check whether it's set. VLA capture
is still mostly broken, but this fixes this ICE.
gcc/cp/ChangeLog
2020-02-25 Jason Merrill <jason@redhat.com>
Alexandre Oliva [Tue, 5 Feb 2019 06:11:25 +0000 (06:11 +0000)]
c++: test partial specializations for type dependence [PR87770]
When instantiating a partial specialization of a template member
function for a full specialization of a class template, we test
whether the context of variables local to the partial specialization,
i.e., the partial specialization itself, is dependent, and this ICEs
in type_dependent_expression_p, when checking that the function type
isn't type-dependent because it is not in a type-dependent scope.
We shouldn't have got that far: the previous block in
type_dependent_expression_p catches cases in which the function itself
takes template arguments of its own, but it only did so for primary
templates, not for partial specializations. This patch fixes that.
Marek Polacek [Fri, 20 Dec 2019 23:30:04 +0000 (23:30 +0000)]
PR c++/92745 - bogus error when initializing array of vectors.
In r268428 I changed reshape_init_r in such a way that when it sees
a nested { } in a CONSTRUCTOR with missing braces, it just returns
the initializer:
+ else if (COMPOUND_LITERAL_P (stripped_init)
...
+ ++d->cur;
+ gcc_assert (!BRACE_ENCLOSED_INITIALIZER_P (stripped_init));
+ return init;
But as this test shows, that's incorrect: if TYPE is an array, we need
to proceed to reshape_init_array_1 which will iterate over the array
initializers:
6006 /* Loop until there are no more initializers. */
6007 for (index = 0;
6008 d->cur != d->end && (!sized_array_p || index <= max_index_cst);
6009 ++index)
6010 {
and update d.cur accordingly. In other words, when reshape_init gets
we recurse on the first element:
{col[0][0], col[1][0], col[2][0], col[3][0]}
and we can't just move d.cur to point to
{col[0][1], col[1][1], col[2][1], col[3][1]}
and return; we need to iterate, so that d.cur ends up being properly
updated, and after all initializers have been seen, points to d.end.
Currently we skip the loop, wherefore we hit this:
6502 /* Make sure all the element of the constructor were used. Otherwise,
6503 issue an error about exceeding initializers. */
6504 if (d.cur != d.end)
6505 {
6506 if (complain & tf_error)
6507 error ("too many initializers for %qT", type);
6508 return error_mark_node;
6509 }
gcc/cp/ChangeLog
2020-02-25 Marek Polacek <polacek@redhat.com>
PR c++/92745 - bogus error when initializing array of vectors.
* decl.c (reshape_init_r): For a nested compound literal, do
call reshape_init_{class,array,vector}.
gcc/testsuite/ChangeLog
2020-02-25 Marek Polacek <polacek@redhat.com>
Jakub Jelinek <jakub@redhat.com>
PR c++/92745
* g++.dg/cpp0x/initlist118.C: New test.
* g++.dg/cpp0x/initlist118.C: Add -Wno-psabi -w to dg-options.
Jakub Jelinek [Tue, 25 Feb 2020 12:56:47 +0000 (13:56 +0100)]
combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908]
git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13.
I've managed to reduce it into the following testcase. The problem is that
during combine we see the s->k = -1; bitfield store and change the SET_SRC
from a pseudo into a constant:
(set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
(const_int 10 [0xa])) [0 +0 S2 A16])
(const_int 2 [0x2])
(const_int 7 [0x7]))
(const_int -1 [0xffffffffffffffff]))
This on s390x with the above option isn't recognized as valid instruction,
so find_split_point decides to handle it as IOR or IOR/AND.
src is -1, mask is 3 and pos is 7.
src != mask (this is also incorrect, we want to set all (both) bits in the
bitfield), so we go for IOR/AND, but instead of trying
mem = (mem & ~0x180) | ((-1 << 7) & 0x180)
we actually try
mem = (mem & ~0x180) | (-1 << 7)
and that is further simplified into:
mem = mem | (-1 << 7)
aka
mem = mem | 0xff80
which doesn't set just the 2-bit bitfield, but also many other bitfields
that shouldn't be touched.
We really should do:
mem = mem | 0x180
instead.
The problem is that we assume that no bits but those low len (2 here) will
be set in the SET_SRC, but there is nothing that can prevent that, we just
should ignore the other bits.
The following patch fixes it by masking src with mask, this way already
the src == mask test will DTRT, and as the code for or_mask uses
gen_int_mode, if the most significant bit is set after shifting it left by
pos, it will be properly sign-extended.
2020-02-25 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/93908
* combine.c (find_split_point): For store into ZERO_EXTRACT, and src
with mask.
Eric Botcazou [Tue, 25 Feb 2020 11:34:00 +0000 (12:34 +0100)]
Fix link failure with debug info in LTO mode
This fixes a regression whereby the program fails to link with debug
info in LTO mode because of an undefined reference to a symbol coming
from the object files containing the early debug info.
* dwarf2out.c (dwarf2out_size_function): Run in early-DWARF mode.
Roman Zhuykov [Tue, 25 Feb 2020 11:32:42 +0000 (14:32 +0300)]
doc: backport proper description of --enable-checking behavior
This patch rewords the whole description to fix minor issues:
- documents 'gimple' and 'types' checks,
- clarifies what happens when option is used without '=list',
- fixes inaccurate wrong wording about release snapshots,
- describes that release checks can only be disabled explicitly.
Backport from master
2020-02-24 Roman Zhuykov <zhroma@ispras.ru>
* doc/install.texi (--enable-checking): Properly document current
behavior.
(--enable-stage1-checking): Minor clarification about bootstrap.
vect: Fix offset calculation for -ve strides [PR93767]
This PR is a regression caused by r256644, which added support for alias
checks involving variable strides. One of the changes in that commit
was to split the access size out of the segment length. The PR shows
that I hadn't done that correctly for the handling of negative strides
in vect_compile_time_alias. The old code was:
where vect_get_scalar_dr_size (a) was cancelling out the subtraction
of the access size inherent in "- const_length_a". Taking the access
size out of the segment length meant that the addition was no longer
needed/correct.
2020-02-25 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2020-02-19 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/93767
* tree-vect-data-refs.c (vect_compile_time_alias): Remove the
access-size bias from the offset calculations for negative strides.
gcc/testsuite/
Backport from mainline
2020-02-19 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/93767
* gcc.dg/vect/pr93767.c: New test.
predcom has the following code to stop one rogue load from
interfering with other store-load opportunities:
/* If A is read and B write or vice versa and there is unsuitable
dependence, instead of merging both components into a component
that will certainly not pass suitable_component_p, just put the
read into bad component, perhaps at least the write together with
all the other data refs in it's component will be optimizable. */
But when store-store commoning was added later, this had the effect
of ignoring loads that occur between two candidate stores.
There is code further up to handle loads and stores with unknown
dependences:
/* Don't do store elimination if there is any unknown dependence for
any store data reference. */
if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
&& (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
|| DDR_NUM_DIST_VECTS (ddr) == 0))
eliminate_store_p = false;
But the store-load code above skips loads for *known* dependences
if (a) the load has already been marked "bad" or (b) the data-ref
machinery knows the dependence distance, but determine_offsets
can't handle the combination.
(a) happens to be the problem in the testcase, but a different
sequence could have given (b) instead. We have writes to individual
fields of a structure and reads from the whole structure. Since
determine_offsets requires the types to be the same, it returns false
for each such read/write combination.
This patch records which components have had loads removed and
prevents store-store commoning for them. It's a bit too pessimistic,
since there shouldn't be a problem if a "bad" load dominates all stores
in a component. But (a) we can't AFAIK use pcom_stmt_dominates_stmt_p
here and (b) the handling for that case would probably need to be
removed again if we handled more exotic cases in future.
2020-02-25 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/93434
* tree-predcom.c (split_data_refs_to_components): Record which
components have had aliasing loads removed. Prevent store-store
commoning for all such components.
gcc/testsuite/
PR tree-optimization/93434
* gcc.c-torture/execute/pr93434.c: New test.
Check for bitwise identity when encoding VECTOR_CSTs [PR92768]
This PR shows that we weren't checking for bitwise-identical values
when trying to encode a VECTOR_CST, so -0.0 was treated the same as
0.0 for -fno-signed-zeros. The patch adds a new OEP flag to select
that behaviour.
2020-02-25 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2019-12-05 Richard Sandiford <richard.sandiford@arm.com>
PR middle-end/92768
* tree-core.h (OEP_BITWISE): New flag.
* fold-const.c (operand_compare::operand_equal_p): Handle it.
* tree-vector-builder.h (tree_vector_builder::equal_p): Pass it.
gcc/testsuite/
PR middle-end/92768
* gcc.dg/pr92768.c: New test.
Fix SLP downward group access classification [PR92420]
This PR was caused by the SLP handling in get_group_load_store_type
returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for
downward groups.
A more elaborate fix would be to try to combine the reverse permutation
into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on
optimisation and not backport material. It might also not necessarily
be a win, if the target supports (say) reversing and odd/even swaps
as independent permutes but doesn't recognise the combined form.
2020-02-25 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2019-11-11 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/92420
* tree-vect-stmts.c (get_negative_load_store_type): Move further
up file.
(get_group_load_store_type): Use it for reversed SLP accesses.
gcc/testsuite/
PR tree-optimization/92420
* gcc.dg/vect/pr92420.c: New test.
Jason Merrill [Mon, 24 Feb 2020 21:22:45 +0000 (16:22 -0500)]
c++: Fix constexpr vs. omitted aggregate init.
Value-initialization is importantly different from {}-initialization for
this testcase, where the former calls the deleted S constructor and the
latter initializes S happily.
gcc/cp/ChangeLog
2020-02-24 Jason Merrill <jason@redhat.com>
PR c++/90951
* constexpr.c (cxx_eval_array_reference): {}-initialize missing
elements instead of value-initializing them.
Jason Merrill [Mon, 24 Feb 2020 21:22:45 +0000 (16:22 -0500)]
cgraph: A COMDAT decl always has non-zero address.
We should be able to assume that a template instantiation or other COMDAT
has non-zero address even if MAKE_DECL_ONE_ONLY for the target sets
DECL_WEAK and we haven't yet decided to emit a definition in this
translation unit.
gcc/ChangeLog
2020-02-24 Jason Merrill <jason@redhat.com>
PR c++/92003
* symtab.c (symtab_node::nonzero_address): A DECL_COMDAT decl has
non-zero address even if weak and not yet defined.
Jason Merrill [Mon, 24 Feb 2020 21:22:45 +0000 (16:22 -0500)]
c++: Fix decltype of empty pack expansion of parm.
In unevaluated context, we only substitute a single PARM_DECL, not the
entire chain, but the handling of an empty pack expansion was missing that
check.
gcc/cp/ChangeLog
2020-02-24 Jason Merrill <jason@redhat.com>
PR c++/93140
* pt.c (tsubst_decl) [PARM_DECL]: Check cp_unevaluated_operand in
handling of TREE_CHAIN for empty pack.
Peter Bergner [Mon, 24 Feb 2020 04:04:44 +0000 (22:04 -0600)]
rs6000: Fix infinite loop building ghostscript and icu [PR93658]
Fix rs6000_legitimate_address_p(), which erroneously marks a valid Altivec
address as being invalid, which causes LRA's process_address() to go into
an infinite loop spilling the same address over and over again.
Include Mike's earlier commits that fix bugs this patch exposes.
Backport from master
2020-02-20 Peter Bergner <bergner@linux.ibm.com>
Michael Meissner [Mon, 24 Feb 2020 03:57:11 +0000 (21:57 -0600)]
Adjust how variable vector extraction is done.
Backport from master
2020-02-03 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/rs6000.c (get_vector_offset): New helper function
to calculate the offset in memory from the start of a vector of a
particular element. Add code to keep the element number in
bounds if the element number is variable.
(rs6000_adjust_vec_address): Move calculation of offset of the
vector element to get_vector_offset.
(rs6000_split_vec_extract_var): Do not do the initial AND of
element here, move the code to get_vector_offset.
Fix PR 93568 (thinko)
Backport from master
2020-02-05 Michael Meissner <meissner@linux.ibm.com>
PR target/93568
* config/rs6000/rs6000.c (get_vector_offset): Fix Q constraint assert
to use MEM.
Michael Meissner [Mon, 24 Feb 2020 00:39:57 +0000 (18:39 -0600)]
Fix bad code of vector extract of PC-relative address with variable element #.
Backport from master
2020-01-06 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
Use 'Q' for doing vector extract from memory.
(vsx_extract_v4sf_var): Use 'Q' for doing vector extract from
memory.
(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Use 'Q' for
doing vector extract from memory.
(vsx_extract_<mode>_<VS_scalar>mode_var): Use 'Q' for doing vector
extract from memory.
Fix handling of floating-point homogeneous aggregates.
2020-02-21 John David Anglin <danglin@gcc.gnu.org>
* gcc/config/pa/pa.c (pa_function_value): Fix check for word and
double-word size when handling aggregate return values.
* gcc/config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate
that homogeneous SFmode and DFmode aggregates are passed and returned
in general registers.
Uros Bizjak [Thu, 20 Feb 2020 21:00:45 +0000 (22:00 +0100)]
i386: Fix *vec_extractv2sf_1 and *vec_extractv2sf_1 shufps alternative [PR93828]
shufps moves two of the four packed single-precision floating-point values
from *destination* operand (first operand) into the low quadword of the
destination operand. Match source operand to the destination.
PR target/93828
* config/i386/mmx.md (*vec_extractv2sf_1): Match source operand
to destination operand for shufps alternative.
(*vec_extractv2si_1): Ditto.
* collect2.c (tool_cleanup): Avoid calling not signal-safe
functions.
(maybe_run_lto_and_relink): Avoid possible signal handler
access to unintialzed memory (lto_o_files).
Mark Eggleston [Wed, 19 Feb 2020 10:30:38 +0000 (10:30 +0000)]
[fortran] ICE assign character pointer to non target PR93714
An ICE occurred if an attempt was made to assign a pointer to a
character variable that has an length incorrectly specified using
a real constant and does not have the target attribute.
Backported from mainline
2020-02-18 Mark Eggleston <markeggleston@gcc.gnu.org>
PR fortran/93714
* expr.c (gfc_check_pointer_assign): Move check for
matching character length to after checking the lvalue
attributes for target or pointer.
PR fortran/93714
* gfortran.dg/char_pointer_assign_6.f90: Look for no target
message instead of length mismatch.
* gfortran.dg/pr93714_1.f90
* gfortran.dg/pr93714_2.f90
Richard Biener [Wed, 22 Jan 2020 11:38:12 +0000 (12:38 +0100)]
tree-optimization/93381 fix integer offsetting in points-to analysis
We were incorrectly assuming a merge operation is conservative enough
for not explicitely handled operations but we also need to consider
offsetting within fields when field-sensitive analysis applies.
2020-01-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/93381
* tree-ssa-structalias.c (find_func_aliases): Assume offsetting
throughout, handle all conversions the same.
Richard Biener [Fri, 14 Feb 2020 08:17:57 +0000 (09:17 +0100)]
debug/92763 keep DIEs that might be used in DW_TAG_inlined_subroutine
We were pruning type-local subroutine DIEs if their context is unused
despite us later needing those DIEs as abstract origins for inlines.
The patch makes code already present for -fvar-tracking-assignments
unconditional.
2020-02-14 Richard Biener <rguenther@suse.de>
Backport from mainline
2020-01-20 Richard Biener <rguenther@suse.de>
PR debug/92763
* dwarf2out.c (prune_unused_types): Unconditionally mark
called function DIEs.
Richard Biener [Fri, 14 Feb 2020 08:10:48 +0000 (09:10 +0100)]
middle-end/92674 delay purging EH edges when folding during inlining
2020-02-14 Richard Biener <rguenther@suse.de>
Backport from mainline
2019-11-27 Richard Biener <rguenther@suse.de>
PR middle-end/92674
* tree-inline.c (expand_call_inline): Delay purging EH/abnormal
edges and instead record blocks in bitmap.
(gimple_expand_calls_inline): Adjust.
(fold_marked_statements): Delay EH cleanup until all folding is
done.
(optimize_inline_calls): Do EH/abnormal cleanup for calls after
inlining finished.
Jakub Jelinek [Sat, 15 Feb 2020 11:53:44 +0000 (12:53 +0100)]
match.pd: Disallow side-effects in GENERIC for non-COND_EXPR to COND_EXPR simplifications [PR93744]
As the following testcases show (the first one reported, last two
found by code inspection), we need to disallow side-effects
in simplifications that turn some unconditional expression into conditional
one. From my little understanding of genmatch.c, it is able to
automatically disallow side effects if the same operand is used multiple
times in the match pattern, maybe if it is used multiple times in the
replacement pattern, and if it is used in conditional contexts in the match
pattern, could it be taught to handle this case too? If yes, perhaps
just the first hunk could be usable for 8/9 backports (+ the testcases).
2020-02-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/93744
* match.pd (((m1 >/</>=/<= m2) * d -> (m1 >/</>=/<= m2) ? d : 0,
A - ((A - B) & -(C cmp D)) -> (C cmp D) ? B : A,
A + ((B - A) & -(C cmp D)) -> (C cmp D) ? B : A): For GENERIC, make
sure @2 in the first and @1 in the other patterns has no side-effects.
* gcc.c-torture/execute/pr93744-1.c: New test.
* gcc.c-torture/execute/pr93744-2.c: New test.
* gcc.c-torture/execute/pr93744-3.c: New test.
Eric Botcazou [Fri, 14 Feb 2020 18:21:02 +0000 (19:21 +0100)]
Fix problematic TLS sequences for the Solaris linker
This is an old thinko pertaining to the interaction between TLS
sequences and delay slot filling: the compiler knows that it cannot
put instructions with TLS relocations into delay slots with the
original Sun TLS model, but it tests TARGET_SUN_TLS in this context,
which depends only on the assembler. So if the compiler is configured
with the GNU assembler and the Solaris linker, then TARGET_GNU_TLS is
set instead and the limitation is not enforced.
PR target/93704
* config/sparc/sparc.c (eligible_for_call_delay): Test HAVE_GNU_LD
in conjunction with TARGET_GNU_TLS in early return.
Jakub Jelinek [Fri, 14 Feb 2020 16:36:00 +0000 (17:36 +0100)]
c++: Fix thinko in enum_min_precision [PR61414]
When backporting the PR61414 fix to 8.4, I've noticed that the caching
of prec is actually broken, as it would fail to actually store the computed
precision into the hash_map's value and so next time we'd think the enum needs
0 bits.
2020-02-14 Jakub Jelinek <jakub@redhat.com>
PR c++/61414
* class.c (enum_min_precision): Change prec type from int to int &.
Jakub Jelinek [Fri, 14 Feb 2020 14:50:57 +0000 (15:50 +0100)]
c: Fix ICE with cast to VLA [93576]
The following testcase ICEs, because the PR84305 changes try to evaluate
the size earlier. If size has side-effects, that is desirable, and the
side-effects will actually be wrapped in a SAVE_EXPR. The problem on this
testcase is that there are no side-effects, and c_fully_fold doesn't fold
those COMPOUND_EXPRs to constant, and while before gimplification we unshare
trees found in the expressions, the unsharing doesn't involve TYPE_SIZE etc.
of used types. Gimplification is destructive though, so when we gimplify
the two nested COMPOUND_EXPRs and then try to gimplify it the second time
for the TYPE_SIZEs, we ICE.
Now, we could use unshare_expr in what we push to *expr, SAVE_EXPRs and
their operands in there aren't unshared, but I really don't see a point of
evaluating expressions that don't have side-effects before, so instead
this just pushes there expressions that do have side-effects.
2020-02-13 Jakub Jelinek <jakub@redhat.com>
PR c/93576
* c-decl.c (grokdeclarator): If this_size_varies, only push size into
*expr if it has side effects.
Jakub Jelinek [Fri, 14 Feb 2020 14:50:10 +0000 (15:50 +0100)]
i386: Fix up _mm*_mask_popcnt_epi* [PR93696]
As mentioned in the PR and as
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mask_popcnt_epi
also documents, _mm*_popcnt_epi* intrinsics are consistent with all other
unary AVX512* intrinsics regarding arguments, i.e. the
_mm*_whatever has just single argument (called a in the docs, and __A in the
GCC headers),
_mm*_mask_whatever has 3 arguments (called src, k, a in the docs and
_W, __U, __A in GCC headers) and
_mm*_maskz_whatever 2 arguments (called k, a in the docs and __U, __A in GCC
headers). Unfortunately, whomever implemented the _mm*_popcnt_epi*
intrinsics got it wrong for the _mm*_mask_popcnt_epi* ones, calling the
args __A, __U, __B and not passing them in the canonical order to the
builtins, making it API incompatible with ICC as well as clang (tested on
godbolts clang 7/8/9/trunk and ICC 19.0.{0,1}, older clang/ICC don't
understand those, so it isn't that it used to be broken even in other
compilers and got changed afterwards).
2020-02-13 Jakub Jelinek <jakub@redhat.com>
PR target/93696
* config/i386/avx512bitalgintrin.h (_mm512_mask_popcnt_epi8,
_mm512_mask_popcnt_epi16, _mm256_mask_popcnt_epi8,
_mm256_mask_popcnt_epi16, _mm_mask_popcnt_epi8,
_mm_mask_popcnt_epi16): Rename __B argument to __A and __A to __W,
pass __A to the builtin followed by __W instead of __A followed by
__B.
* config/i386/avx512vpopcntdqintrin.h (_mm512_mask_popcnt_epi32,
_mm512_mask_popcnt_epi64): Likewise.
* config/i386/avx512vpopcntdqvlintrin.h (_mm_mask_popcnt_epi32,
_mm256_mask_popcnt_epi32, _mm_mask_popcnt_epi64,
_mm256_mask_popcnt_epi64): Likewise.
Jakub Jelinek [Fri, 14 Feb 2020 14:49:32 +0000 (15:49 +0100)]
i386: Fix k*shift* intrinsics [PR93673]
As mentioned in the PR, the intrinsics allow counts from 0 to 255, but
we actually reject values from 128 to 255. That is because QImode
CONST_INTs can be only -128 to 127. Fixed by using const_0_to_255_operand
and dropping the modes for the operands with those predicates
(the IL actually contains the CONST_INT which has VOIDmode).
2020-02-13 Jakub Jelinek <jakub@redhat.com>
PR target/93673
* config/i386/sse.md (k<code><mode>): Drop mode from last operand and
use const_0_to_255_operand predicate instead of immediate_operand.
(avx512dq_fpclass<mode><mask_scalar_merge_name>,
avx512dq_vmfpclass<mode><mask_scalar_merge_name>,
vgf2p8affineinvqb_<mode><mask_name>,
vgf2p8affineqb_<mode><mask_name>): Drop mode from
const_0_to_255_operand predicated operands.
* gcc.target/i386/avx512f-pr93673.c: New test.
* gcc.target/i386/avx512dq-pr93673.c: New test.
* gcc.target/i386/avx512bw-pr93673.c: New test.
Jakub Jelinek [Fri, 14 Feb 2020 14:49:11 +0000 (15:49 +0100)]
i386: Fix up vec_extract_lo* patterns [PR93670]
The VEXTRACT* insns have way too many different CPUID feature flags (ATT
syntax)
vextractf128 $imm, %ymm, %xmm/mem AVX
vextracti128 $imm, %ymm, %xmm/mem AVX2
vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F
vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F
vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ
vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ
vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ
vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F
As the testcase shows and the patch too, we didn't get it right in all
cases.
The first hunk is about avx512vl_vextractf128v8s[if] incorrectly
requiring TARGET_AVX512DQ. The corresponding insn is the first
vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it
correct (TARGET_AVX512VL implies TARGET_AVX512F):
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI)
We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if].
The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using
the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that,
but instead incorrectly && 1 for non-masked and && (64 == 64 && TARGET_AVX512VL)
for masked insns. This is extraction from ZMM, so it doesn't need VL for
anything. The hunk actually only requires TARGET_AVX512DQ when the insn
is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can
use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F
and does the same thing, extracts the low 256 bits from 512 bits vector
(often we split it into just nothing, but there are some special cases like
when using xmm16+ when we can't without AVX512VL).
The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask
suffixed ones are ok already and just split into nothing (lowpart subreg).
The masked ones were incorrectly requiring TARGET_AVX512VL and
TARGET_AVX512DQ, when we only need TARGET_AVX512VL.
2020-02-12 Jakub Jelinek <jakub@redhat.com>
PR target/93670
* config/i386/sse.md (VI48F_256_DQ): New mode iterator.
(avx512vl_vextractf128<mode>): Use it instead of VI48F_256. Remove
TARGET_AVX512DQ from condition.
(vec_extract_lo_<mode><mask_name>): Use <mask_avx512dq_condition>
instead of <mask_mode512bit_condition> in condition. If
TARGET_AVX512DQ is false, emit vextract*64x4 instead of
vextract*32x8.
(vec_extract_lo_<mode><mask_name>): Drop <mask_avx512dq_condition>
from condition.
Jakub Jelinek [Fri, 14 Feb 2020 14:48:42 +0000 (15:48 +0100)]
i386: Fix -mavx -mno-mavx2 ICE with VEC_COND_EXPR [PR93637]
As mentioned in the PR, for -mavx -mno-avx2 the backend does support
vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors
aren't much supported in that case, it is performed using
vandps/vandnps/vorps). The problem is that after the last generic vector
lowering (where the VEC_COND_EXPR still compares two V4DF vectors and
has two V4DI last operands and V4DI result and so is considered ok) fre4
folds the condition into constant, at which point the middle-end during
expansion will try vcond_mask_optab and fall back to trying to expand it
as the constant vector < 0 vcondv4div4di, but neither of them is supported
for -mavx -mno-avx2 and thus we ICE.
So, the options I see is either what the following patch does, also support
vcond_mask_v4div4di and vcond_mask_v4siv4si already for TARGET_AVX, or
require for vcondv4div4df and vcondv8siv8sf TARGET_AVX2 rather than current
TARGET_AVX.
2020-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/93637
* config/i386/sse.md (VI_256_AVX2): New mode iterator.
(vcond_mask_<mode><sseintvecmodelower>): Use it instead of VI_256.
Change condition from TARGET_AVX2 to TARGET_AVX.
Jakub Jelinek [Fri, 14 Feb 2020 14:47:55 +0000 (15:47 +0100)]
i386: Make xmm16-xmm31 call used even in ms ABI [PR65782]
On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> I guess that Comment #9 patch form the PR should be trivially correct,
> but althouhg it looks obvious, I don't want to propose the patch since
> I have no means of testing it.
I don't have means of testing it either.
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
128-bits only) are call preserved.
We are talking e.g. about
/* { dg-options "-O2 -mabi=ms -mavx512vl" } */
typedef double V __attribute__((vector_size (16)));
void foo (void);
V bar (void);
void baz (V);
void
qux (void)
{
V c;
{
register V a __asm ("xmm18");
V b = bar ();
asm ("" : "=x" (a) : "0" (b));
c = a;
}
foo ();
{
register V d __asm ("xmm18");
V e;
d = c;
asm ("" : "=x" (e) : "0" (d));
baz (e);
}
}
where according to the MSDN doc gcc incorrectly holds the c value
in xmm18 register across the foo call; if foo is compiled by some Microsoft
compiler (or LLVM), then it could clobber %xmm18.
If all xmm18 occurrences are changed to say xmm15, then it is valid to hold
the 128-bit value across the foo call (though, surprisingly, LLVM saves it
into stack anyway).
The other parts are I guess mainly about SEH. Consider e.g.
void
foo (void)
{
register double x __asm ("xmm14");
register double y __asm ("xmm18");
asm ("" : "=x" (x));
asm ("" : "=v" (y));
x += y;
y += x;
asm ("" : : "x" (x));
asm ("" : : "v" (y));
}
looking at cross-compiler output, with -O2 -mavx512f this emits
.file "abcdeq.c"
.text
.align 16
.globl foo
.def foo; .scl 2; .type 32; .endef
.seh_proc foo
foo:
subq $40, %rsp
.seh_stackalloc 40
vmovaps %xmm14, (%rsp)
.seh_savexmm %xmm14, 0
vmovaps %xmm18, 16(%rsp)
.seh_savexmm %xmm18, 16
.seh_endprologue
vaddsd %xmm18, %xmm14, %xmm14
vaddsd %xmm18, %xmm14, %xmm18
vmovaps (%rsp), %xmm14
vmovaps 16(%rsp), %xmm18
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (GNU) 10.0.1 20200207 (experimental)"
Does whatever assembler mingw64 uses even assemble this (I mean the
.seh_savexmm %xmm16, 16 could be problematic)?
I can find e.g.
https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
which then links to
https://gcc.gnu.org/PR65782
2020-02-08 Uroš Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/65782
* config/i386/i386.h (CALL_USED_REGISTERS): Make
xmm16-xmm31 call-used even in 64-bit ms-abi.
Jakub Jelinek [Fri, 14 Feb 2020 13:18:10 +0000 (14:18 +0100)]
openmp: Fix handling of non-addressable shared scalars in parallel nested inside of target [PR93515]
As the following testcase shows, we need to consider even target to be a construct
that forces not to use copy in/out for shared on parallel inside of the target.
E.g. for parallel nested inside another parallel or host teams, we already avoid
copy in/out and we need to treat target the same.
2020-02-06 Jakub Jelinek <jakub@redhat.com>
PR libgomp/93515
* omp-low.c (use_pointer_for_field): For nested constructs, also
look for map clauses on target construct.
(scan_omp_1_stmt) <case GIMPLE_OMP_TARGET>: Bump temporarily
taskreg_nesting_level.
* testsuite/libgomp.c-c++-common/pr93515.c: New test.
Jakub Jelinek [Fri, 14 Feb 2020 14:46:51 +0000 (15:46 +0100)]
openmp: Avoid ICEs with declare simd; declare simd inbranch [PR93555]
The testcases ICE because when processing the declare simd inbranch,
we don't create the i == 0 clone as it already exists, which means
clone_info->nargs is not adjusted, but we then rely on it being adjusted
when trying other clones.
2020-02-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93555
* omp-simd-clone.c (expand_simd_clones): If simd_clone_mangle or
simd_clone_create failed when i == 0, adjust clone->nargs by
clone->inbranch.
* c-c++-common/gomp/pr93555-1.c: New test.
* c-c++-common/gomp/pr93555-2.c: New test.
* gfortran.dg/gomp/pr93555.f90: New test.
Jakub Jelinek [Fri, 14 Feb 2020 14:46:13 +0000 (15:46 +0100)]
combine: Punt on out of range rotate counts [PR93505]
What happens on this testcase is with the out of bounds rotate we get:
Trying 13 -> 16:
13: r129:SI=r132:DI#0<-<0x20
REG_DEAD r132:DI
16: r123:DI=r129:SI<0
REG_DEAD r129:SI
Successfully matched this instruction:
(set (reg/v:DI 123 [ <retval> ])
(const_int 0 [0]))
during combine. So, perhaps we could also change simplify-rtx.c to punt
if it is out of bounds rather than trying to optimize anything.
Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
have out of bounds counts or introduce targetm.rotate_truncation_mask,
we should truncate the argument instead of punting.
Punting is better for backports though.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93505
* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
rotate counts.
Jakub Jelinek [Fri, 14 Feb 2020 14:44:23 +0000 (15:44 +0100)]
postreload: Fix up postreload combine [PR93402]
The following testcase is miscompiled, because the postreload pass changes:
-(insn 14 13 23 2 (parallel [
- (set (reg:DI 1 dx [94])
- (plus:DI (reg:DI 1 dx [95])
- (reg:DI 5 di [92])))
- (clobber (reg:CC 17 flags))
- ]) "pr93402.c":8:30 186 {*adddi_1}
- (expr_list:REG_EQUAL (plus:DI (reg:DI 5 di [92])
- (const_int 111111111111 [0x19debd01c7]))
- (nil)))
-(insn 23 14 25 2 (set (reg:SI 0 ax)
+(insn 23 13 25 2 (set (reg:SI 0 ax)
(const_int 0 [0])) "pr93402.c":10:1 67 {*movsi_internal}
(nil))
(insn 25 23 26 2 (use (reg:SI 0 ax)) "pr93402.c":10:1 -1
(nil))
-(insn 26 25 35 2 (use (reg:DI 1 dx)) "pr93402.c":10:1 -1
+(insn 26 25 35 2 (use (plus:DI (reg:DI 1 dx [95])
+ (reg:DI 5 di [92]))) "pr93402.c":10:1 -1
(nil))
A USE insn is not a normal insn and verify_changes called from
apply_change_group is happy about any changes into it.
The following patch avoids this optimization if we were to change
the USE operand (this routine only changes a reg into (plus reg reg2)).
2020-01-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/93402
* postreload.c (reload_combine_recognize_pattern): Don't try to adjust
USE insns.
Jakub Jelinek [Fri, 14 Feb 2020 14:43:47 +0000 (15:43 +0100)]
i386: Fix up -fdollars-in-identifiers with identifiers starting with $ in -masm=att [PR91298]
In AT&T syntax leading $ is special, so if we have identifiers that start
with dollar, we usually fail to assemble it (or assemble incorrectly).
As mentioned in the PR, what works is wrapping the identifiers inside of
parens, like:
movl $($a), %eax
leaq ($a)(,%rdi,4), %rax
movl ($a)(%rip), %eax
movl ($a)+16(%rip), %eax
.globl $a
.type $a, @object
.size $a, 72
$a:
.string "$a"
.quad ($a)
(this is x86_64 -fno-pic -O2). In some places ($a) is not accepted,
like as .globl operand, in .type, .size, so the patch overrides
ASM_OUTPUT_SYMBOL_REF rather than e.g. ASM_OUTPUT_LABELREF.
I didn't want to duplicate what assemble_name is doing (following
transparent aliases), so split assemble_name into two parts; just
mere looking at the first character of a name before calling assemble_name
wouldn't be good enough, a transparent alias could lead from a name
not starting with $ to one starting with it and vice versa.
2020-01-22 Jakub Jelinek <jakub@redhat.com>
PR target/91298
* output.h (assemble_name_resolve): Declare.
* varasm.c (assemble_name_resolve): New function.
(assemble_name): Use it.
* config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Define.
* gcc.target/i386/pr91298-1.c: New test.
* gcc.target/i386/pr91298-2.c: New test.
Jakub Jelinek [Fri, 14 Feb 2020 14:41:59 +0000 (15:41 +0100)]
openmp: Teach omp_code_to_statement about rest of OpenMP statements
The omp_code_to_statement function added with the initial OpenACC support
only handled small subset of the OpenMP statements, leading to ICE if
any other OpenMP directive appeared inside of OpenACC directive.
Jakub Jelinek [Fri, 14 Feb 2020 14:41:22 +0000 (15:41 +0100)]
riscv: Fix up riscv_rtx_costs for RTL checking (PR target/93333)
As mentioned in the PR, during combine rtx_costs can be called sometimes
even on RTL that has not been validated yet and so can contain even operands
that aren't valid in any instruction.
2020-01-21 Jakub Jelinek <jakub@redhat.com>
PR target/93333
* config/riscv/riscv.c (riscv_rtx_costs) <case ZERO_EXTRACT>: Verify
the last two operands are CONST_INT_P before using them as such.
Jakub Jelinek [Fri, 14 Feb 2020 14:40:34 +0000 (15:40 +0100)]
powerpc: Fix ICE with fp conditional move (PR target/93073)
The following testcase ICEs, because for TFmode the particular subtraction
pattern (*subtf3) is not enabled with the given options. Using
expand_simple_binop instead of emitting the subtraction by hand just moves
the ICE one insn later, NEG of ABS is not then recognized, etc., but
ultimately the problem is that when rs6000_emit_cmove is called for floating
point operand mode (and earlier condition ensures that in that case
compare_mode is also floating point), the expander makes sure the
operand mode is SFDF, but for the comparison mode nothing checks it, yet
there is just one *fsel* pattern with 2 separate SFDF iterators.
The following patch fixes it by giving up if compare_mode is not SFmode or
DFmode.
2020-01-21 Jakub Jelinek <jakub@redhat.com>
PR target/93073
* config/rs6000/rs6000.c (rs6000_emit_cmove): If using fsel, punt for
compare_mode other than SFmode or DFmode.