Christophe Lyon [Wed, 19 Feb 2020 20:55:23 +0000 (20:55 +0000)]
ARM: Fix -mpure-code for v6m
When running the testsuite with -fdisable-rtl-fwprop2 and -mpure-code
for cortex-m0, I noticed that some testcases were failing because we
still generate "ldr rX, .LCY", which is what we want to avoid with
-mpure-code. This is latent since a recent improvement in fwprop
(PR88833).
In this patch I change the thumb1_movsi_insn pattern so that it emits
the desired instruction sequence when arm_disable_literal_pool is set.
To achieve that, I introduce a new required_for_purecode attribute to
enable the corresponding alternative in thumb1_movsi_insn and take the
actual instruction sequence length into account.
Backport from mainline
2020-02-25 Christophe Lyon <christophe.lyon@linaro.org>
* config/arm/arm.md (required_for_purecode): New attribute.
(enabled): Handle required_for_purecode.
* config/arm/thumb1.md (thumb1_movsi_insn): Add alternative to
work with -mpure-code.
Christophe Lyon [Tue, 17 Dec 2019 15:43:07 +0000 (15:43 +0000)]
ARM: Add support for -mpure-code in thumb-1 (v6m)
This patch extends support for -mpure-code to all thumb-1 processors,
by removing the need for MOVT.
Symbol addresses are built using upper8_15, upper0_7, lower8_15 and
lower0_7 relocations, and constants are built using sequences of
movs/adds and lsls instructions.
The extension of the *thumb1_movhf pattern uses always the same size
(6) although it can emit a shorter sequence when possible. This is
similar to what *arm32_movhf already does.
CASE_VECTOR_PC_RELATIVE is now false with -mpure-code, to avoid
generating invalid assembly code with differences from symbols from
two different sections (the difference cannot be computed by the
assembler).
Tests pr45701-[12].c needed a small adjustment to avoid matching
upper8_15 when looking for the r8 register.
Test no-literal-pool.c is augmented with __fp16, so it now uses
-mfp16-format=ieee.
Test thumb1-Os-mult.c generates an inline code sequence with
-mpure-code and computes the multiplication by using a sequence of
add/shift rather than using the multiply instruction, so we skip it in
presence of -mpure-code.
With -mcpu=cortex-m0, the pure-code/no-literal-pool.c fails because
code like:
static char *p = "Hello World";
char *
testchar ()
{
return p + 4;
}
By contrast, when using -mcpu=cortex-m4, the code looks like:
.section .rodata
.LC0:
.ascii "Hello World\000"
.data
p:
.word .LC0
testchar:
push {r7}
add r7, sp, #0
movw r3, #:lower16:p
movt r3, #:upper16:p
ldr r3, [r3]
adds r3, r3, #4
mov r0, r3
mov sp, r7
pop {r7}
bx lr
I haven't found yet how to make code for cortex-m0 apply upper/lower
relocations to "p" instead of .LC2. The current code looks functional,
but could be improved.
Jakub Jelinek [Tue, 25 Feb 2020 12:56:47 +0000 (13:56 +0100)]
combine: Fix find_split_point handling of constant store into ZERO_EXTRACT [PR93908]
git is miscompiled on s390x-linux with -O2 -march=zEC12 -mtune=z13.
I've managed to reduce it into the following testcase. The problem is that
during combine we see the s->k = -1; bitfield store and change the SET_SRC
from a pseudo into a constant:
(set (zero_extract:DI (mem/j:HI (plus:DI (reg/v/f:DI 60 [ s ])
(const_int 10 [0xa])) [0 +0 S2 A16])
(const_int 2 [0x2])
(const_int 7 [0x7]))
(const_int -1 [0xffffffffffffffff]))
This on s390x with the above option isn't recognized as valid instruction,
so find_split_point decides to handle it as IOR or IOR/AND.
src is -1, mask is 3 and pos is 7.
src != mask (this is also incorrect, we want to set all (both) bits in the
bitfield), so we go for IOR/AND, but instead of trying
mem = (mem & ~0x180) | ((-1 << 7) & 0x180)
we actually try
mem = (mem & ~0x180) | (-1 << 7)
and that is further simplified into:
mem = mem | (-1 << 7)
aka
mem = mem | 0xff80
which doesn't set just the 2-bit bitfield, but also many other bitfields
that shouldn't be touched.
We really should do:
mem = mem | 0x180
instead.
The problem is that we assume that no bits but those low len (2 here) will
be set in the SET_SRC, but there is nothing that can prevent that, we just
should ignore the other bits.
The following patch fixes it by masking src with mask, this way already
the src == mask test will DTRT, and as the code for or_mask uses
gen_int_mode, if the most significant bit is set after shifting it left by
pos, it will be properly sign-extended.
2020-02-25 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/93908
* combine.c (find_split_point): For store into ZERO_EXTRACT, and src
with mask.
Eric Botcazou [Tue, 25 Feb 2020 11:34:00 +0000 (12:34 +0100)]
Fix link failure with debug info in LTO mode
This fixes a regression whereby the program fails to link with debug
info in LTO mode because of an undefined reference to a symbol coming
from the object files containing the early debug info.
* dwarf2out.c (dwarf2out_size_function): Run in early-DWARF mode.
Roman Zhuykov [Tue, 25 Feb 2020 11:32:42 +0000 (14:32 +0300)]
doc: backport proper description of --enable-checking behavior
This patch rewords the whole description to fix minor issues:
- documents 'gimple' and 'types' checks,
- clarifies what happens when option is used without '=list',
- fixes inaccurate wrong wording about release snapshots,
- describes that release checks can only be disabled explicitly.
Backport from master
2020-02-24 Roman Zhuykov <zhroma@ispras.ru>
* doc/install.texi (--enable-checking): Properly document current
behavior.
(--enable-stage1-checking): Minor clarification about bootstrap.
vect: Fix offset calculation for -ve strides [PR93767]
This PR is a regression caused by r256644, which added support for alias
checks involving variable strides. One of the changes in that commit
was to split the access size out of the segment length. The PR shows
that I hadn't done that correctly for the handling of negative strides
in vect_compile_time_alias. The old code was:
where vect_get_scalar_dr_size (a) was cancelling out the subtraction
of the access size inherent in "- const_length_a". Taking the access
size out of the segment length meant that the addition was no longer
needed/correct.
2020-02-24 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2020-02-19 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/93767
* tree-vect-data-refs.c (vect_compile_time_alias): Remove the
access-size bias from the offset calculations for negative strides.
gcc/testsuite/
Backport from mainline
2020-02-19 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/93767
* gcc.dg/vect/pr93767.c: New test.
Mark Eggleston [Mon, 24 Feb 2020 15:53:24 +0000 (15:53 +0000)]
fortran: ICE using SHAPE with FINDLOC PR93835
Backported from mainline
2020-02-24 Mark Eggleston <markeggleston@gcc.gnu.org>
PR fortran/93835
* simplify.c (simplify_findloc_nodim) : Fix whitespace issues.
(gfc_simplify_shape) : Create and initialise one shape value
for the result expression. Set shape value with the rank of
the source array.
PR fortran/93835
* gfortran.dg/pr77351.f90 : Check for one error instead of two.
* gfortran.dg/pr93835.f08 : New test.
* collect2.c (tool_cleanup): Avoid calling not signal-safe
functions.
(maybe_run_lto_and_relink): Avoid possible signal handler
access to unintialzed memory (lto_o_files).
Peter Bergner [Mon, 24 Feb 2020 00:22:57 +0000 (18:22 -0600)]
rs6000: Fix infinite loop building ghostscript and icu [PR93658]
Fix rs6000_legitimate_address_p(), which erroneously marks a valid Altivec
address as being invalid, which causes LRA's process_address() to go into
an infinite loop spilling the same address over and over again.
Include Mike's earlier commits that fix bugs this patch exposes.
Backport from master
2020-02-20 Peter Bergner <bergner@linux.ibm.com>
Michael Meissner [Mon, 24 Feb 2020 00:17:12 +0000 (18:17 -0600)]
Adjust how variable vector extraction is done.
Backport from master
2020-02-03 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/rs6000.c (get_vector_offset): New helper function
to calculate the offset in memory from the start of a vector of a
particular element. Add code to keep the element number in
bounds if the element number is variable.
(rs6000_adjust_vec_address): Move calculation of offset of the
vector element to get_vector_offset.
(rs6000_split_vec_extract_var): Do not do the initial AND of
element here, move the code to get_vector_offset.
Fix PR 93568 (thinko)
Backport from master
2020-02-05 Michael Meissner <meissner@linux.ibm.com>
PR target/93568
* config/rs6000/rs6000.c (get_vector_offset): Fix Q constraint assert
to use MEM.
Michael Meissner [Mon, 24 Feb 2020 00:13:00 +0000 (18:13 -0600)]
Fix bad code of vector extract of PC-relative address with variable element #.
Backport from master
2020-01-06 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/vsx.md (vsx_extract_<mode>_var, VSX_D iterator):
Use 'Q' for doing vector extract from memory.
(vsx_extract_v4sf_var): Use 'Q' for doing vector extract from
memory.
(vsx_extract_<mode>_var, VSX_EXTRACT_I iterator): Use 'Q' for
doing vector extract from memory.
(vsx_extract_<mode>_<VS_scalar>mode_var): Use 'Q' for doing vector
extract from memory.
Thomas König [Thu, 13 Feb 2020 21:22:04 +0000 (22:22 +0100)]
Use au->lock exclusively for locking in async I/O.
2020-02-18 Thomas Koenig <tkoenig@gcc.gnu.org>
Backport from trunk
PR fortran/93599
* io/async.c (destroy_adv_cond): Do not destroy lock.
(async_io): Make sure au->lock is locked for finishing of thread.
Do not lock/unlock around signalling emptysignal. Unlock au->lock
before return.
(init_adv_cond): Do not initialize lock.
(enqueue_transfer): Unlock after signal.
(enqueue_done_id): Likewise.
(enqueue_done): Likewise.
(enqueue_close): Likewise.
(enqueue_data_transfer): Likewise.
(async_wait_id): Do not lock/unlock around signalling au->work.
(async_wait): Unlock after signal.
* io/async.h (SIGNAL): Add comment about needed au->lock.
Remove locking/unlocking of advcond->lock.
(WAIT_SIGNAL_MUTEX): Add comment. Remove locking/unlocking of
advcond->lock. Unlock mutex only at the end. Loop on
__ghread_cond_wait returning zero.
(REVOKE_SIGNAL): Add comment. Remove locking/unlocking of
advcond->lock.
(struct adv_cond): Remove mutex from struct.
Fix handling of floating-point homogeneous aggregates.
2020-02-21 John David Anglin <danglin@gcc.gnu.org>
* gcc/config/pa/pa.c (pa_function_value): Fix check for word and
double-word size when handling aggregate return values.
* gcc/config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate
that homogeneous SFmode and DFmode aggregates are passed and returned
in general registers.
Uros Bizjak [Thu, 20 Feb 2020 20:58:57 +0000 (21:58 +0100)]
i386: Fix *vec_extractv2sf_1 and *vec_extractv2sf_1 shufps alternative [PR93828]
shufps moves two of the four packed single-precision floating-point values
from *destination* operand (first operand) into the low quadword of the
destination operand. Match source operand to the destination.
PR target/93828
* config/i386/mmx.md (*vec_extractv2sf_1): Match source operand
to destination operand for shufps alternative.
(*vec_extractv2si_1): Ditto.
Mark Eggleston [Wed, 19 Feb 2020 09:36:42 +0000 (09:36 +0000)]
[Fortran] ICE assign character pointer to non target PR93714
An ICE occurred if an attempt was made to assign a pointer to a
character variable that has an length incorrectly specified using
a real constant and does not have the target attribute.
Backported from mainline
2020-02-18 Mark Eggleston <markeggleston@gcc.gnu.org>
PR fortran/93714
* expr.c (gfc_check_pointer_assign): Move check for
matching character length to after checking the lvalue
attributes for target or pointer.
PR fortran/93714
* gfortran.dg/char_pointer_assign_6.f90: Look for no target
message instead of length mismatch.
* gfortran.dg/pr93714_1.f90
* gfortran.dg/pr93714_2.f90
Check for bitwise identity when encoding VECTOR_CSTs [PR92768]
This PR shows that we weren't checking for bitwise-identical values
when trying to encode a VECTOR_CST, so -0.0 was treated the same as
0.0 for -fno-signed-zeros. The patch adds a new OEP flag to select
that behaviour.
2020-02-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2019-12-05 Richard Sandiford <richard.sandiford@arm.com>
PR middle-end/92768
* tree-core.h (OEP_BITWISE): New flag.
* fold-const.c (operand_compare::operand_equal_p): Handle it.
* tree-vector-builder.h (tree_vector_builder::equal_p): Pass it.
gcc/testsuite/
PR middle-end/92768
* gcc.dg/pr92768.c: New test.
Mark Eggleston [Tue, 18 Feb 2020 10:56:38 +0000 (10:56 +0000)]
[fortran] ICE in gfc_validate_kind(): Got bad kind [PR93580]
Caused by using invalid part_refs in kind specifications,
e.g. %re or %im on non-complex expressions and %len on
non character expressions.
Check whether %re, %im and %len are valid when checking
kind specification.
The original patch from Steven G. Kargl <kargl@gcc.gnu.org> only
checked for %re and %im.
Backported from mainline
2020-02-18 Mark Eggleston <markeggleston@gcc.gnu.org>
PR fortran/93580
* primary.c (gfc_match_varspec): If the symbol following %
is re or im and the primary expression type is not BT_COMPLEX
issue an error. If the symbol is len and the primary
expression type is not BT_CHARACTER is an error.
PR fortran/93580
* gfortran.dg/dg/pr93580.f90: New test.
predcom has the following code to stop one rogue load from
interfering with other store-load opportunities:
/* If A is read and B write or vice versa and there is unsuitable
dependence, instead of merging both components into a component
that will certainly not pass suitable_component_p, just put the
read into bad component, perhaps at least the write together with
all the other data refs in it's component will be optimizable. */
But when store-store commoning was added later, this had the effect
of ignoring loads that occur between two candidate stores.
There is code further up to handle loads and stores with unknown
dependences:
/* Don't do store elimination if there is any unknown dependence for
any store data reference. */
if ((DR_IS_WRITE (dra) || DR_IS_WRITE (drb))
&& (DDR_ARE_DEPENDENT (ddr) == chrec_dont_know
|| DDR_NUM_DIST_VECTS (ddr) == 0))
eliminate_store_p = false;
But the store-load code above skips loads for *known* dependences
if (a) the load has already been marked "bad" or (b) the data-ref
machinery knows the dependence distance, but determine_offsets
can't handle the combination.
(a) happens to be the problem in the testcase, but a different
sequence could have given (b) instead. We have writes to individual
fields of a structure and reads from the whole structure. Since
determine_offsets requires the types to be the same, it returns false
for each such read/write combination.
This patch records which components have had loads removed and
prevents store-store commoning for them. It's a bit too pessimistic,
since there shouldn't be a problem if a "bad" load dominates all stores
in a component. But (a) we can't AFAIK use pcom_stmt_dominates_stmt_p
here and (b) the handling for that case would probably need to be
removed again if we handled more exotic cases in future.
2020-02-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/93434
* tree-predcom.c (split_data_refs_to_components): Record which
components have had aliasing loads removed. Prevent store-store
commoning for all such components.
gcc/testsuite/
PR tree-optimization/93434
* gcc.c-torture/execute/pr93434.c: New test.
Don't pass booleans as mask types to simd clones [PR92710]
In this PR we assigned a vector mask type to the result of a comparison
and then tried to pass that mask type to a simd clone, which expected
a normal (non-mask) type instead.
This patch simply punts on call arguments that have a mask type.
A better fix would be to pattern-match the comparison to a COND_EXPR,
like we would if the comparison was stored to memory, but doing that
isn't gcc 9 or 10 material.
Note that this doesn't affect x86_64-linux-gnu because the ABI promotes
bool arguments to ints.
2020-02-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
Fix SLP downward group access classification [PR92420]
This PR was caused by the SLP handling in get_group_load_store_type
returning VMAT_CONTIGUOUS rather than VMAT_CONTIGUOUS_REVERSE for
downward groups.
A more elaborate fix would be to try to combine the reverse permutation
into SLP_TREE_LOAD_PERMUTATION for loads, but that's really a follow-on
optimisation and not backport material. It might also not necessarily
be a win, if the target supports (say) reversing and odd/even swaps
as independent permutes but doesn't recognise the combined form.
2020-02-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Backport from mainline
2019-11-11 Richard Sandiford <richard.sandiford@arm.com>
PR tree-optimization/92420
* tree-vect-stmts.c (get_negative_load_store_type): Move further
up file.
(get_group_load_store_type): Use it for reversed SLP accesses.
gcc/testsuite/
PR tree-optimization/92420
* gcc.dg/vect/pr92420.c: New test.
Jakub Jelinek [Sat, 15 Feb 2020 11:53:44 +0000 (12:53 +0100)]
match.pd: Disallow side-effects in GENERIC for non-COND_EXPR to COND_EXPR simplifications [PR93744]
As the following testcases show (the first one reported, last two
found by code inspection), we need to disallow side-effects
in simplifications that turn some unconditional expression into conditional
one. From my little understanding of genmatch.c, it is able to
automatically disallow side effects if the same operand is used multiple
times in the match pattern, maybe if it is used multiple times in the
replacement pattern, and if it is used in conditional contexts in the match
pattern, could it be taught to handle this case too? If yes, perhaps
just the first hunk could be usable for 8/9 backports (+ the testcases).
2020-02-15 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/93744
* match.pd (((m1 >/</>=/<= m2) * d -> (m1 >/</>=/<= m2) ? d : 0,
A - ((A - B) & -(C cmp D)) -> (C cmp D) ? B : A,
A + ((B - A) & -(C cmp D)) -> (C cmp D) ? B : A): For GENERIC, make
sure @2 in the first and @1 in the other patterns has no side-effects.
* gcc.c-torture/execute/pr93744-1.c: New test.
* gcc.c-torture/execute/pr93744-2.c: New test.
* gcc.c-torture/execute/pr93744-3.c: New test.
Eric Botcazou [Fri, 14 Feb 2020 18:21:02 +0000 (19:21 +0100)]
Fix problematic TLS sequences for the Solaris linker
This is an old thinko pertaining to the interaction between TLS
sequences and delay slot filling: the compiler knows that it cannot
put instructions with TLS relocations into delay slots with the
original Sun TLS model, but it tests TARGET_SUN_TLS in this context,
which depends only on the assembler. So if the compiler is configured
with the GNU assembler and the Solaris linker, then TARGET_GNU_TLS is
set instead and the limitation is not enforced.
PR target/93704
* config/sparc/sparc.c (eligible_for_call_delay): Test HAVE_GNU_LD
in conjunction with TARGET_GNU_TLS in early return.
Jakub Jelinek [Fri, 14 Feb 2020 16:36:00 +0000 (17:36 +0100)]
c++: Fix thinko in enum_min_precision [PR61414]
When backporting the PR61414 fix to 8.4, I've noticed that the caching
of prec is actually broken, as it would fail to actually store the computed
precision into the hash_map's value and so next time we'd think the enum needs
0 bits.
2020-02-14 Jakub Jelinek <jakub@redhat.com>
PR c++/61414
* class.c (enum_min_precision): Change prec type from int to int &.
Richard Biener [Wed, 5 Feb 2020 13:04:29 +0000 (14:04 +0100)]
middle-end/90648 fend off builtin calls with not enough arguments from match
This adds guards to genmatch generated code before accessing call
expression or stmt arguments that might be out of bounds when
the user provided bogus prototypes for what we consider builtins.
2020-02-05 Richard Biener <rguenther@suse.de>
PR middle-end/90648
* genmatch.c (dt_node::gen_kids_1): Emit number of argument
checks before matching calls.
Richard Biener [Wed, 22 Jan 2020 11:38:12 +0000 (12:38 +0100)]
tree-optimization/93381 fix integer offsetting in points-to analysis
We were incorrectly assuming a merge operation is conservative enough
for not explicitely handled operations but we also need to consider
offsetting within fields when field-sensitive analysis applies.
2020-01-22 Richard Biener <rguenther@suse.de>
PR tree-optimization/93381
* tree-ssa-structalias.c (find_func_aliases): Assume offsetting
throughout, handle all conversions the same.
Richard Biener [Fri, 14 Feb 2020 08:23:06 +0000 (09:23 +0100)]
tree-optimization/93439 move clique bookkeeping to OMP expansion
Autopar was doing clique bookkeeping too early when creating destination
functions but then later introducing new cliques via versioning loops.
The following moves the bookkeeping to the actual outlining process.
2020-02-14 Richard Biener <rguenther@suse.de>
Backport from mainline
2020-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/93439
* tree-parloops.c (create_loop_fn): Move clique bookkeeping...
* tree-cfg.c (move_sese_region_to_fn): ... here.
(verify_types_in_gimple_reference): Verify used cliques are
tracked.
Richard Biener [Fri, 14 Feb 2020 08:17:57 +0000 (09:17 +0100)]
debug/92763 keep DIEs that might be used in DW_TAG_inlined_subroutine
We were pruning type-local subroutine DIEs if their context is unused
despite us later needing those DIEs as abstract origins for inlines.
The patch makes code already present for -fvar-tracking-assignments
unconditional.
2020-02-14 Richard Biener <rguenther@suse.de>
Backport from mainline
2020-01-20 Richard Biener <rguenther@suse.de>
PR debug/92763
* dwarf2out.c (prune_unused_types): Unconditionally mark
called function DIEs.
Richard Biener [Fri, 14 Feb 2020 08:10:48 +0000 (09:10 +0100)]
middle-end/92674 delay purging EH edges when folding during inlining
2020-02-14 Richard Biener <rguenther@suse.de>
Backport from mainline
2019-11-27 Richard Biener <rguenther@suse.de>
PR middle-end/92674
* tree-inline.c (expand_call_inline): Delay purging EH/abnormal
edges and instead record blocks in bitmap.
(gimple_expand_calls_inline): Adjust.
(fold_marked_statements): Delay EH cleanup until all folding is
done.
(optimize_inline_calls): Do EH/abnormal cleanup for calls after
inlining finished.
Jakub Jelinek [Thu, 13 Feb 2020 20:00:09 +0000 (21:00 +0100)]
c: Fix ICE with cast to VLA [93576]
The following testcase ICEs, because the PR84305 changes try to evaluate
the size earlier. If size has side-effects, that is desirable, and the
side-effects will actually be wrapped in a SAVE_EXPR. The problem on this
testcase is that there are no side-effects, and c_fully_fold doesn't fold
those COMPOUND_EXPRs to constant, and while before gimplification we unshare
trees found in the expressions, the unsharing doesn't involve TYPE_SIZE etc.
of used types. Gimplification is destructive though, so when we gimplify
the two nested COMPOUND_EXPRs and then try to gimplify it the second time
for the TYPE_SIZEs, we ICE.
Now, we could use unshare_expr in what we push to *expr, SAVE_EXPRs and
their operands in there aren't unshared, but I really don't see a point of
evaluating expressions that don't have side-effects before, so instead
this just pushes there expressions that do have side-effects.
2020-02-13 Jakub Jelinek <jakub@redhat.com>
PR c/93576
* c-decl.c (grokdeclarator): If this_size_varies, only push size into
*expr if it has side effects.
Jakub Jelinek [Thu, 13 Feb 2020 09:43:27 +0000 (10:43 +0100)]
i386: Fix up _mm*_mask_popcnt_epi* [PR93696]
As mentioned in the PR and as
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mask_popcnt_epi
also documents, _mm*_popcnt_epi* intrinsics are consistent with all other
unary AVX512* intrinsics regarding arguments, i.e. the
_mm*_whatever has just single argument (called a in the docs, and __A in the
GCC headers),
_mm*_mask_whatever has 3 arguments (called src, k, a in the docs and
_W, __U, __A in GCC headers) and
_mm*_maskz_whatever 2 arguments (called k, a in the docs and __U, __A in GCC
headers). Unfortunately, whomever implemented the _mm*_popcnt_epi*
intrinsics got it wrong for the _mm*_mask_popcnt_epi* ones, calling the
args __A, __U, __B and not passing them in the canonical order to the
builtins, making it API incompatible with ICC as well as clang (tested on
godbolts clang 7/8/9/trunk and ICC 19.0.{0,1}, older clang/ICC don't
understand those, so it isn't that it used to be broken even in other
compilers and got changed afterwards).
2020-02-13 Jakub Jelinek <jakub@redhat.com>
PR target/93696
* config/i386/avx512bitalgintrin.h (_mm512_mask_popcnt_epi8,
_mm512_mask_popcnt_epi16, _mm256_mask_popcnt_epi8,
_mm256_mask_popcnt_epi16, _mm_mask_popcnt_epi8,
_mm_mask_popcnt_epi16): Rename __B argument to __A and __A to __W,
pass __A to the builtin followed by __W instead of __A followed by
__B.
* config/i386/avx512vpopcntdqintrin.h (_mm512_mask_popcnt_epi32,
_mm512_mask_popcnt_epi64): Likewise.
* config/i386/avx512vpopcntdqvlintrin.h (_mm_mask_popcnt_epi32,
_mm256_mask_popcnt_epi32, _mm_mask_popcnt_epi64,
_mm256_mask_popcnt_epi64): Likewise.
Jakub Jelinek [Thu, 13 Feb 2020 07:17:07 +0000 (08:17 +0100)]
i386: Fix k*shift* intrinsics [PR93673]
As mentioned in the PR, the intrinsics allow counts from 0 to 255, but
we actually reject values from 128 to 255. That is because QImode
CONST_INTs can be only -128 to 127. Fixed by using const_0_to_255_operand
and dropping the modes for the operands with those predicates
(the IL actually contains the CONST_INT which has VOIDmode).
2020-02-13 Jakub Jelinek <jakub@redhat.com>
PR target/93673
* config/i386/sse.md (k<code><mode>): Drop mode from last operand and
use const_0_to_255_operand predicate instead of immediate_operand.
(avx512dq_fpclass<mode><mask_scalar_merge_name>,
avx512dq_vmfpclass<mode><mask_scalar_merge_name>,
vgf2p8affineinvqb_<mode><mask_name>,
vgf2p8affineqb_<mode><mask_name>): Drop mode from
const_0_to_255_operand predicated operands.
* gcc.target/i386/avx512f-pr93673.c: New test.
* gcc.target/i386/avx512dq-pr93673.c: New test.
* gcc.target/i386/avx512bw-pr93673.c: New test.
Jakub Jelinek [Wed, 12 Feb 2020 10:58:35 +0000 (11:58 +0100)]
i386: Fix up vec_extract_lo* patterns [PR93670]
The VEXTRACT* insns have way too many different CPUID feature flags (ATT
syntax)
vextractf128 $imm, %ymm, %xmm/mem AVX
vextracti128 $imm, %ymm, %xmm/mem AVX2
vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F
vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F
vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ
vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ
vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ
vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F
As the testcase shows and the patch too, we didn't get it right in all
cases.
The first hunk is about avx512vl_vextractf128v8s[if] incorrectly
requiring TARGET_AVX512DQ. The corresponding insn is the first
vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it
correct (TARGET_AVX512VL implies TARGET_AVX512F):
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI)
BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI)
We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if].
The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using
the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that,
but instead incorrectly && 1 for non-masked and && (64 == 64 && TARGET_AVX512VL)
for masked insns. This is extraction from ZMM, so it doesn't need VL for
anything. The hunk actually only requires TARGET_AVX512DQ when the insn
is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can
use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F
and does the same thing, extracts the low 256 bits from 512 bits vector
(often we split it into just nothing, but there are some special cases like
when using xmm16+ when we can't without AVX512VL).
The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask
suffixed ones are ok already and just split into nothing (lowpart subreg).
The masked ones were incorrectly requiring TARGET_AVX512VL and
TARGET_AVX512DQ, when we only need TARGET_AVX512VL.
2020-02-12 Jakub Jelinek <jakub@redhat.com>
PR target/93670
* config/i386/sse.md (VI48F_256_DQ): New mode iterator.
(avx512vl_vextractf128<mode>): Use it instead of VI48F_256. Remove
TARGET_AVX512DQ from condition.
(vec_extract_lo_<mode><mask_name>): Use <mask_avx512dq_condition>
instead of <mask_mode512bit_condition> in condition. If
TARGET_AVX512DQ is false, emit vextract*64x4 instead of
vextract*32x8.
(vec_extract_lo_<mode><mask_name>): Drop <mask_avx512dq_condition>
from condition.
Jakub Jelinek [Mon, 10 Feb 2020 21:44:40 +0000 (22:44 +0100)]
i386: Fix -mavx -mno-mavx2 ICE with VEC_COND_EXPR [PR93637]
As mentioned in the PR, for -mavx -mno-avx2 the backend does support
vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors
aren't much supported in that case, it is performed using
vandps/vandnps/vorps). The problem is that after the last generic vector
lowering (where the VEC_COND_EXPR still compares two V4DF vectors and
has two V4DI last operands and V4DI result and so is considered ok) fre4
folds the condition into constant, at which point the middle-end during
expansion will try vcond_mask_optab and fall back to trying to expand it
as the constant vector < 0 vcondv4div4di, but neither of them is supported
for -mavx -mno-avx2 and thus we ICE.
So, the options I see is either what the following patch does, also support
vcond_mask_v4div4di and vcond_mask_v4siv4si already for TARGET_AVX, or
require for vcondv4div4df and vcondv8siv8sf TARGET_AVX2 rather than current
TARGET_AVX.
2020-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/93637
* config/i386/sse.md (VI_256_AVX2): New mode iterator.
(vcond_mask_<mode><sseintvecmodelower>): Use it instead of VI_256.
Change condition from TARGET_AVX2 to TARGET_AVX.
Jakub Jelinek [Sat, 8 Feb 2020 09:59:40 +0000 (10:59 +0100)]
i386: Make xmm16-xmm31 call used even in ms ABI [PR65782]
On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> I guess that Comment #9 patch form the PR should be trivially correct,
> but althouhg it looks obvious, I don't want to propose the patch since
> I have no means of testing it.
I don't have means of testing it either.
https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019
is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low
128-bits only) are call preserved.
We are talking e.g. about
/* { dg-options "-O2 -mabi=ms -mavx512vl" } */
typedef double V __attribute__((vector_size (16)));
void foo (void);
V bar (void);
void baz (V);
void
qux (void)
{
V c;
{
register V a __asm ("xmm18");
V b = bar ();
asm ("" : "=x" (a) : "0" (b));
c = a;
}
foo ();
{
register V d __asm ("xmm18");
V e;
d = c;
asm ("" : "=x" (e) : "0" (d));
baz (e);
}
}
where according to the MSDN doc gcc incorrectly holds the c value
in xmm18 register across the foo call; if foo is compiled by some Microsoft
compiler (or LLVM), then it could clobber %xmm18.
If all xmm18 occurrences are changed to say xmm15, then it is valid to hold
the 128-bit value across the foo call (though, surprisingly, LLVM saves it
into stack anyway).
The other parts are I guess mainly about SEH. Consider e.g.
void
foo (void)
{
register double x __asm ("xmm14");
register double y __asm ("xmm18");
asm ("" : "=x" (x));
asm ("" : "=v" (y));
x += y;
y += x;
asm ("" : : "x" (x));
asm ("" : : "v" (y));
}
looking at cross-compiler output, with -O2 -mavx512f this emits
.file "abcdeq.c"
.text
.align 16
.globl foo
.def foo; .scl 2; .type 32; .endef
.seh_proc foo
foo:
subq $40, %rsp
.seh_stackalloc 40
vmovaps %xmm14, (%rsp)
.seh_savexmm %xmm14, 0
vmovaps %xmm18, 16(%rsp)
.seh_savexmm %xmm18, 16
.seh_endprologue
vaddsd %xmm18, %xmm14, %xmm14
vaddsd %xmm18, %xmm14, %xmm18
vmovaps (%rsp), %xmm14
vmovaps 16(%rsp), %xmm18
addq $40, %rsp
ret
.seh_endproc
.ident "GCC: (GNU) 10.0.1 20200207 (experimental)"
Does whatever assembler mingw64 uses even assemble this (I mean the
.seh_savexmm %xmm16, 16 could be problematic)?
I can find e.g.
https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin/43210527
which then links to
https://gcc.gnu.org/PR65782
2020-02-08 Uroš Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/65782
* config/i386/i386.h (CALL_USED_REGISTERS): Make
xmm16-xmm31 call-used even in 64-bit ms-abi.
Jakub Jelinek [Thu, 6 Feb 2020 08:19:08 +0000 (09:19 +0100)]
openmp: Fix handling of non-addressable shared scalars in parallel nested inside of target [PR93515]
As the following testcase shows, we need to consider even target to be a construct
that forces not to use copy in/out for shared on parallel inside of the target.
E.g. for parallel nested inside another parallel or host teams, we already avoid
copy in/out and we need to treat target the same.
2020-02-06 Jakub Jelinek <jakub@redhat.com>
PR libgomp/93515
* omp-low.c (use_pointer_for_field): For nested constructs, also
look for map clauses on target construct.
(scan_omp_1_stmt) <case GIMPLE_OMP_TARGET>: Bump temporarily
taskreg_nesting_level.
* testsuite/libgomp.c-c++-common/pr93515.c: New test.
Jakub Jelinek [Thu, 6 Feb 2020 08:15:13 +0000 (09:15 +0100)]
openmp: Notice reduction decl in outer contexts after adding it to shared [PR93515]
If we call omp_add_variable, following omp_notice_variable will already find it
on that construct and not go through outer constructs, the following patch fixes that.
Note, this still doesn't follow OpenMP 5.0 semantics on target combined with other
constructs with reduction/lastprivate/linear clauses, will handle that for GCC11.
2020-02-06 Jakub Jelinek <jakub@redhat.com>
PR libgomp/93515
* gimplify.c (gimplify_scan_omp_clauses) <do_notice>: If adding
shared clause, call omp_notice_variable on outer context if any.
Jakub Jelinek [Wed, 5 Feb 2020 22:35:08 +0000 (23:35 +0100)]
c++: Mark __builtin_convertvector operand as read [PR93557]
In C++ we weren't calling mark_exp_read on the __builtin_convertvector first
argument. I guess it could misbehave even with lambda implicit captures.
Fixed by calling decay_conversion on the argument, we use the argument as
rvalue so we want the standard lvalue to rvalue conversions, but as the
argument must be a vector type, e.g. integral promotions aren't really
needed.
2020-02-05 Jakub Jelinek <jakub@redhat.com>
PR c++/93557
* semantics.c (cp_build_vec_convert): Call decay_conversion on arg
prior to passing it to c_build_vec_convert.
Jakub Jelinek [Wed, 5 Feb 2020 10:32:37 +0000 (11:32 +0100)]
openmp: Avoid ICEs with declare simd; declare simd inbranch [PR93555]
The testcases ICE because when processing the declare simd inbranch,
we don't create the i == 0 clone as it already exists, which means
clone_info->nargs is not adjusted, but we then rely on it being adjusted
when trying other clones.
2020-02-05 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93555
* omp-simd-clone.c (expand_simd_clones): If simd_clone_mangle or
simd_clone_create failed when i == 0, adjust clone->nargs by
clone->inbranch.
* c-c++-common/gomp/pr93555-1.c: New test.
* c-c++-common/gomp/pr93555-2.c: New test.
* gfortran.dg/gomp/pr93555.f90: New test.
Jakub Jelinek [Thu, 30 Jan 2020 20:28:17 +0000 (21:28 +0100)]
combine: Punt on out of range rotate counts [PR93505]
What happens on this testcase is with the out of bounds rotate we get:
Trying 13 -> 16:
13: r129:SI=r132:DI#0<-<0x20
REG_DEAD r132:DI
16: r123:DI=r129:SI<0
REG_DEAD r129:SI
Successfully matched this instruction:
(set (reg/v:DI 123 [ <retval> ])
(const_int 0 [0]))
during combine. So, perhaps we could also change simplify-rtx.c to punt
if it is out of bounds rather than trying to optimize anything.
Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't
have out of bounds counts or introduce targetm.rotate_truncation_mask,
we should truncate the argument instead of punting.
Punting is better for backports though.
2020-01-30 Jakub Jelinek <jakub@redhat.com>
PR middle-end/93505
* combine.c (simplify_comparison) <case ROTATE>: Punt on out of range
rotate counts.
Jakub Jelinek [Wed, 29 Jan 2020 08:41:42 +0000 (09:41 +0100)]
openmp: c++: Consider typeinfo decls to be predetermined shared [PR91118]
If the typeinfo decls appear in OpenMP default(none) regions, as we no longer
predetermine const with no mutable members, they are diagnosed as errors,
but it isn't something the users can actually provide explicit sharing for in
the clauses.
2020-01-29 Jakub Jelinek <jakub@redhat.com>
PR c++/91118
* cp-gimplify.c (cxx_omp_predetermined_sharing): Return
OMP_CLAUSE_DEFAULT_SHARED for typeinfo decls.
* g++.dg/gomp/pr91118-1.C: New test.
* g++.dg/gomp/pr91118-2.C: New test.
The following testcase is miscompiled, because the variable shift left
operand, { -1, -1, -1, -1 } is represented as a VECTOR_CST with
VECTOR_CST_NPATTERNS 1 and VECTOR_CST_NELTS_PER_PATTERN 1, so when
we call builder.new_unary_operation, builder.encoded_nelts () will be just 1
and thus we encode the resulting vector as if all the elements were the
same.
For non-masked is_vshift, we could perhaps call builder.new_binary_operation
(TREE_TYPE (args[0]), args[0], args[1], false), but then there are masked
shifts, for non-is_vshift we could perhaps call it too but with args[2]
instead of args[1], but there is no builder.new_ternary_operation.
All this stuff is primarily for aarch64 anyway, on x86 we don't have any
variable length vectors, and it is not a big deal to compute all elements
and just let builder.finalize () find the most efficient VECTOR_CST
representation of the vector. So, instead of doing too much, this just
keeps using new_unary_operation only if only one VECTOR_CST is involved
(i.e. non-masked shift by constant) and for the rest just compute all elts.
2020-01-28 Jakub Jelinek <jakub@redhat.com>
PR target/93418
* config/i386/i386.c (ix86_fold_builtin) <do_shift>: If mask is not
-1 or is_vshift is true, use new_vector with number of elts npatterns
rather than new_unary_operation.
Jakub Jelinek [Thu, 23 Jan 2020 19:08:22 +0000 (20:08 +0100)]
postreload: Fix up postreload combine [PR93402]
The following testcase is miscompiled, because the postreload pass changes:
-(insn 14 13 23 2 (parallel [
- (set (reg:DI 1 dx [94])
- (plus:DI (reg:DI 1 dx [95])
- (reg:DI 5 di [92])))
- (clobber (reg:CC 17 flags))
- ]) "pr93402.c":8:30 186 {*adddi_1}
- (expr_list:REG_EQUAL (plus:DI (reg:DI 5 di [92])
- (const_int 111111111111 [0x19debd01c7]))
- (nil)))
-(insn 23 14 25 2 (set (reg:SI 0 ax)
+(insn 23 13 25 2 (set (reg:SI 0 ax)
(const_int 0 [0])) "pr93402.c":10:1 67 {*movsi_internal}
(nil))
(insn 25 23 26 2 (use (reg:SI 0 ax)) "pr93402.c":10:1 -1
(nil))
-(insn 26 25 35 2 (use (reg:DI 1 dx)) "pr93402.c":10:1 -1
+(insn 26 25 35 2 (use (plus:DI (reg:DI 1 dx [95])
+ (reg:DI 5 di [92]))) "pr93402.c":10:1 -1
(nil))
A USE insn is not a normal insn and verify_changes called from
apply_change_group is happy about any changes into it.
The following patch avoids this optimization if we were to change
the USE operand (this routine only changes a reg into (plus reg reg2)).
2020-01-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/93402
* postreload.c (reload_combine_recognize_pattern): Don't try to adjust
USE insns.
This fixes a fall-out from a patch I had submitted two years ago which started
allowing simplify-rtx to fold logical right shifts by offsets a followed by b
into >> (a + b).
However this can generate inefficient code when the resulting shift count ends
up being the same as the size of the shift mode. This will create some
undefined behavior on most platforms.
This patch changes to code to truncate to 0 if the shift amount goes out of
range. Before my older patch this used to happen in combine when it saw the
two shifts. However since we combine them here combine never gets a chance to
truncate them.
The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal
with this shift constant but it's better to do the right thing in simplify-rtx.
Note that this doesn't take care of the Arithmetic shift where you could replace
the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it.
gcc/ChangeLog:
Backport from mainline
2020-01-31 Tamar Christina <tamar.christina@arm.com>
PR rtl-optimization/91838
* simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
to truncate if allowed or reject combination.
gcc/testsuite/ChangeLog:
Backport from mainline
2020-01-31 Tamar Christina <tamar.christina@arm.com>
Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/91838
* g++.dg/opt/pr91838.C: New test.
H.J. Lu [Mon, 10 Feb 2020 15:58:45 +0000 (07:58 -0800)]
i386: Properly pop restore token in signal frame
Linux CET kernel places a restore token on shadow stack for signal
handler to enhance security. The restore token is 8 byte and aligned
to 8 bytes. It is usually transparent to user programs since kernel
will pop the restore token when signal handler returns. But when an
exception is thrown from a signal handler, now we need to pop the
restore token from shadow stack. For x86-64, we just need to treat
the signal frame as normal frame. For i386, we need to search for
the restore token to check if the original shadow stack is 8 byte
aligned. If the original shadow stack is 8 byte aligned, we just
need to pop 2 slots, one restore token, from shadow stack. Otherwise,
we need to pop 3 slots, one restore token + 4 byte padding, from
shadow stack.
This patch also includes 2 tests, one has a restore token with 4 byte
padding and one without.
Tested on Linux/x86-64 CET machine with and without -m32.
libgcc/
Backport from mainline
PR libgcc/85334
* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
New.
gcc/testsuite/
Backport from mainline
PR libgcc/85334
* g++.target/i386/pr85334-1.C: New test.
* g++.target/i386/pr85334-2.C: Likewise.
Backport from mainline
PR target/85667
* config/i386/i386.c (function_arg_ms_64): Add a type argument.
Don't return aggregates with only SFmode and DFmode in SSE
register.
(ix86_function_arg): Pass type to function_arg_ms_64.
gcc/testsuite/
Backport from mainline
PR target/85667
* gcc.target/i386/pr85667-10.c: New test.
* gcc.target/i386/pr85667-7.c: Likewise.
* gcc.target/i386/pr85667-8.c: Likewise.
* gcc.target/i386/pr85667-9.c: Likewise.
Jason Merrill [Wed, 29 Jan 2020 22:16:12 +0000 (17:16 -0500)]
c++: Drop alignas restriction for stack variables.
Since expand_stack_vars and such know how to deal with variables aligned
beyond MAX_SUPPORTED_STACK_ALIGNMENT, we shouldn't reject alignas of large
alignments. And if we don't do that, there's no point in having
check_cxx_fundamental_alignment_constraints at all, since
check_user_alignment already enforces MAX_OFILE_ALIGNMENT.
Szabolcs Nagy [Wed, 15 Jan 2020 12:23:40 +0000 (12:23 +0000)]
[AArch64] PR92424: Fix -fpatchable-function-entry=N,M with BTI
This is a workaround that emits a BTI after the function label if that
is followed by a patch area. We try to remove the BTI that follows the
patch area (this may fail e.g. if the first instruction is a PACIASP).
So before this commit -fpatchable-function-entry=3,1 with bti generates
.section __patchable_function_entries
.8byte .LPFE
.text
.LPFE:
nop
foo:
nop
nop
bti c // or paciasp
...
and after this commit
.section __patchable_function_entries
.8byte .LPFE
.text
.LPFE:
nop
foo:
bti c
nop
nop
// may be paciasp
...
and with -fpatchable-function-entry=1 (M=0) the code now is
foo:
bti c
.section __patchable_function_entries
.8byte .LPFE
.text
.LPFE:
nop
// may be paciasp
...
There is a new bti insn in the middle of the patchable area users need
to be aware of unless M=0 (patch area is after the new bti) or M=N
(patch area is before the label, no new bti). Note: bti is not added to
all functions consistently (it can be turned off per function using a
target attribute or the compiler may detect that the function is never
called indirectly), so if bti is inserted in the middle of a patch area
then user code needs to deal with detecting it.
Tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
PR target/92424
* config/aarch64/aarch64.c (aarch64_declare_function_name): Set
cfun->machine->label_is_assembled.
(aarch64_print_patchable_function_entry): New.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define.
* config/aarch64/aarch64.h (struct machine_function): New field,
label_is_assembled.
gcc/testsuite/ChangeLog:
PR target/92424
* gcc.target/aarch64/pr92424-2.c: New test.
* gcc.target/aarch64/pr92424-3.c: New test.
Jason Merrill [Tue, 28 Jan 2020 17:26:10 +0000 (12:26 -0500)]
c++: Allow template rvalue-ref conv to bind to lvalue ref.
When I implemented the [over.match.ref] rule that a reference conversion
function needs to match l/rvalue of the target reference type it changed our
handling of this testcase. It seems to me that our current behavior is what
the standard says, but it doesn't seem desirable, and all the other
compilers have our old behavior. So let's limit the change to non-templates
until there's some clarification from the committee.
PR c++/90546
* call.c (build_user_type_conversion_1): Allow a template conversion
returning an rvalue reference to bind directly to an lvalue.
Jason Merrill [Mon, 27 Jan 2020 03:19:47 +0000 (22:19 -0500)]
c++: Fix array of char typedef in template (PR90966).
Since Martin Sebor's patch for PR 71625 to change braced array initializers
to STRING_CST in some cases, we need to be ready for STRING_CST with types
that are changed by tsubst. fold_convert doesn't know how to deal with
STRING_CST, which is reasonable; we really shouldn't expect it to here. So
let's handle STRING_CST separately.
PR c++/90966
* pt.c (tsubst_copy) [STRING_CST]: Don't use fold_convert.
Jason Merrill [Fri, 24 Jan 2020 23:20:56 +0000 (18:20 -0500)]
c++: Fix ICE with lambda in member operator (PR93279)
Here the problem was that we were remembering the lookup in template scope,
and then trying to reuse that lookup in the instantiation without
substituting into it at all. The simplest solution is to not try to
remember a lookup that finds a class-scope declaration, as in that case
doing the normal lookup again at instantiation time will always find the
right declarations.
PR c++/93279 - ICE with lambda in member operator.
* name-lookup.c (maybe_save_operator_binding): Don't remember
class-scope bindings.
Nathan Sidwell [Mon, 27 Jan 2020 13:49:43 +0000 (05:49 -0800)]
c++: Bogus error using namespace alias [PR91826]
My changes to is_nested_namespace broke is_ancestor's use where a namespace
alias might be passed in. This changes is_ancestor to look through the alias.
PR c++/91826
* name-lookup.c (is_ancestor): Allow CHILD to be a namespace alias.
Wilco Dijkstra [Fri, 17 Jan 2020 13:17:21 +0000 (13:17 +0000)]
[AArch64] Fix shrinkwrapping interactions with atomics (PR92692)
The separate shrinkwrapping pass may insert stores in the middle
of atomics loops which can cause issues on some implementations.
Avoid this by delaying splitting atomics patterns until after
prolog/epilog generation.
gcc/
PR target/92692
* config/aarch64/aarch64.c (aarch64_split_compare_and_swap)
Add assert to ensure prolog has been emitted.
(aarch64_split_atomic_op): Likewise.
* config/aarch64/atomics.md (aarch64_compare_and_swap<mode>)
Use epilogue_completed rather than reload_completed.
(aarch64_atomic_exchange<mode>): Likewise.
(aarch64_atomic_<atomic_optab><mode>): Likewise.
(atomic_nand<mode>): Likewise.
(aarch64_atomic_fetch_<atomic_optab><mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(aarch64_atomic_<atomic_optab>_fetch<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.