Jakub Jelinek [Wed, 30 Aug 2023 09:21:45 +0000 (11:21 +0200)]
tree-ssa-strlen: Fix up handling of conditionally zero memcpy [PR110914]
The following testcase is miscompiled since r279392 aka r10-5451-gef29b12cfbb4979
The strlen pass has adjust_last_stmt function, which performs mainly strcat
or strcat-like optimizations (say strcpy (x, "abcd"); strcat (x, p);
or equivalent memcpy (x, "abcd", strlen ("abcd") + 1); char *q = strchr (x, 0);
memcpy (x, p, strlen (p)); etc. where the first stmt stores '\0' character
at the end but next immediately overwrites it and so the first memcpy can be
adjusted to store 1 fewer bytes. handle_builtin_memcpy called this function
in two spots, the first one guarded like:
if (olddsi != NULL
&& tree_fits_uhwi_p (len)
&& !integer_zerop (len))
adjust_last_stmt (olddsi, stmt, false);
i.e. only for constant non-zero length. The other spot can call it even
for non-constant length but in that case we punt before that if that length
isn't length of some string + 1, so again non-zero.
The r279392 change I assume wanted to add some warning stuff and changed it
like
if (olddsi != NULL
- && tree_fits_uhwi_p (len)
&& !integer_zerop (len))
- adjust_last_stmt (olddsi, stmt, false);
+ {
+ maybe_warn_overflow (stmt, len, rvals, olddsi, false, true);
+ adjust_last_stmt (olddsi, stmt, false);
+ }
While maybe_warn_overflow possibly handles non-constant length fine,
adjust_last_stmt really relies on length to be non-zero, which
!integer_zerop (len) alone doesn't guarantee. While we could for
len being SSA_NAME ask the ranger or tree_expr_nonzero_p, I think
adjust_last_stmt will not benefit from it much, so the following patch
just restores the above condition/previous behavior for the adjust_last_stmt
call only.
2023-08-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110914
* tree-ssa-strlen.c (strlen_pass::handle_builtin_memcpy): Don't call
adjust_last_stmt unless len is known constant.
Jakub Jelinek [Wed, 30 Aug 2023 08:47:21 +0000 (10:47 +0200)]
store-merging: Fix up >= 64 bit insertion [PR111015]
The following testcase shows that we mishandle bit insertion for
info->bitsize >= 64. The problem is in using unsigned HOST_WIDE_INT
shift + subtraction + build_int_cst to compute mask, the shift invokes
UB at compile time for info->bitsize 64 and larger and e.g. on the testcase
with info->bitsize happens to compute mask of 0x3f rather than
0x3f'ffffffff'ffffffff.
The patch fixes that by using wide_int wi::mask + wide_int_to_tree, so it
handles masks in any precision (up to WIDE_INT_MAX_PRECISION ;) ).
2023-08-30 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/111015
* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Use wi::mask and
wide_int_to_tree instead of unsigned HOST_WIDE_INT shift and
build_int_cst to build BIT_AND_EXPR mask.
liuhongt [Thu, 10 Aug 2023 03:41:39 +0000 (11:41 +0800)]
Software mitigation: Disable gather generation in vectorization for GDS affected Intel Processors.
For more details of GDS (Gather Data Sampling), refer to
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
After microcode update, there's performance regression. To avoid that,
the patch disables gather generation in autovectorization but uses
gather scalar emulation instead.
gcc/ChangeLog:
* config/i386/i386-options.c (m_GDS): New macro.
* config/i386/x86-tune.def (X86_TUNE_USE_GATHER): Don't enable
for m_GDS.
Patrick Palka [Tue, 9 May 2023 19:06:34 +0000 (15:06 -0400)]
c++: noexcept-spec from nested class confusion [PR109761]
When late processing a noexcept-spec from a nested class after completion
of the outer class (since it's a complete-class context), we pass the wrong
class context to noexcept_override_late_checks -- the outer class type
instead of the nested class type -- which leads to bogus errors in the
below test.
This patch fixes this by making noexcept_override_late_checks obtain the
class context directly via DECL_CONTEXT instead of via an additional
parameter.
PR c++/109761
gcc/cp/ChangeLog:
* parser.c (cp_parser_class_specifier): Don't pass a class
context to noexcept_override_late_checks.
(noexcept_override_late_checks): Remove 'type' parameter
and use DECL_CONTEXT of 'fndecl' instead.
liuhongt [Fri, 4 Aug 2023 01:27:39 +0000 (09:27 +0800)]
Workaround possible CPUID bug in Sandy Bridge.
Don't access leaf 7 subleaf 1 unless subleaf 0 says it is
supported via EAX.
Intel documentation says invalid subleaves return 0. We had been
relying on that behavior instead of checking the max sublef number.
It appears that some Sandy Bridge CPUs return at least the subleaf 0
EDX value for subleaf 1. Best guess is that this is a bug in a
microcode patch since all of the bits we're seeing set in EDX were
introduced after Sandy Bridge was originally released.
This is causing avxvnniint16 to be incorrectly enabled with
-march=native on these CPUs.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features): Check
max_subleaf_level for valid subleaf before use CPUID.
Jakub Jelinek [Mon, 19 Dec 2022 10:24:55 +0000 (11:24 +0100)]
testsuite: Fix up pr107397.f90 test [PR107397]
The pr107397.f90 test FAILs for me, one problem was that the
added diagnostics has an indefinite article before BOZ, but
the test dg-error didn't. The other problem was that on the
other dg-error there was no space between the string and closing
}, so it was completely ignored and the error was an excess
error.
2022-12-19 Jakub Jelinek <jakub@redhat.com>
PR fortran/107397
* gfortran.dg/pr107397.f90: Adjust expected diagnostic wording and
add space between dg-error string and closing }.
Kewen Lin [Wed, 26 Jul 2023 08:42:29 +0000 (03:42 -0500)]
rs6000: Correct vsx operands output for xxeval [PR110741]
PR110741 exposes one issue that we didn't use the correct
character for vsx operands in output operand substitution,
consequently it can map to the wrong registers which hold
some unexpected values.
PR target/110741
gcc/ChangeLog:
* config/rs6000/altivec.md (define_insn xxeval): Correct vsx
operands output with "x".
Harald Anlauf [Sun, 16 Jul 2023 20:17:27 +0000 (22:17 +0200)]
Fortran: intrinsics and deferred-length character arguments [PR95947,PR110658]
gcc/fortran/ChangeLog:
PR fortran/95947
PR fortran/110658
* trans-expr.c (gfc_conv_procedure_call): For intrinsic procedures
whose result characteristics depends on the first argument and which
can be of type character, the character length will not be deferred.
gcc/testsuite/ChangeLog:
PR fortran/95947
PR fortran/110658
* gfortran.dg/deferred_character_37.f90: New test.
testsuite: Require vectors of doubles for pr97428.c
The pr97428.c test assumes support for vectors of doubles, but some
targets only support vectors of floats, causing this test to fail with
such targets. Limit this test to targets that support vectors of
doubles then.
gcc/testsuite/
* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
Harald Anlauf [Tue, 11 Jul 2023 19:21:25 +0000 (21:21 +0200)]
Fortran: formal symbol attributes for intrinsic procedures [PR110288]
gcc/fortran/ChangeLog:
PR fortran/110288
* symbol.c (gfc_copy_formal_args_intr): When deriving the formal
argument attributes from the actual ones for intrinsic procedure
calls, take special care of CHARACTER arguments that we do not
wrongly treat them formally as deferred-length.
gcc/testsuite/ChangeLog:
PR fortran/110288
* gfortran.dg/findloc_10.f90: New test.
Jonathan Wakely [Wed, 12 Jul 2023 13:40:19 +0000 (14:40 +0100)]
libstdc++: Check conversion from filesystem::path to wide strings [PR95048]
The testcase added for this bug only checks conversion from wide strings
on construction, but the fix also covered conversion to wide strings via
path::wstring(). Add checks for that, and u16string() and u32string().
Jonathan Wakely [Fri, 11 Nov 2022 15:22:02 +0000 (15:22 +0000)]
libstdc++: Fix wstring conversions in filesystem::path [PR95048]
In commit r9-7381-g91756c4abc1757 I changed filesystem::path to use
std::codecvt<CharT, char, mbstate_t> for conversions from all wide
strings to UTF-8, instead of using std::codecvt_utf8<CharT>. This was
done because for 16-bit wchar_t, std::codecvt_utf8<wchar_t> only
supports UCS-2 and not UTF-16. The rationale for the change was sound,
but the actual fix was not. It's OK to use std::codecvt for char16_t or
char32_t, because the specializations for those types always use UTF-8 ,
but std::codecvt<wchar_t, char, mbstate_t> uses the current locale's
encodings, and the narrow encoding is probably ASCII and can't support
non-ASCII characters.
The correct fix is to use std::codecvt only for char16_t and char32_t.
For 32-bit wchar_t we could have continued using std::codecvt_utf8
because that uses UTF-32 which is fine, switching to std::codecvt broke
non-Windows targets with 32-bit wchar_t. For 16-bit wchar_t we did need
to change, but should have changed to std::codecvt_utf8_utf16<wchar_t>
instead, as that always uses UTF-16 not UCS-2. I actually noted that in
the commit message for r9-7381-g91756c4abc1757 but didn't use that
option. Oops.
This replaces the unconditional std::codecvt<CharT, char, mbstate_t>
with a type defined via template specialization, so it can vary
depending on the wide character type. The code is also simplified to
remove some of the mess of #ifdef and if-constexpr conditions.
libstdc++-v3/ChangeLog:
PR libstdc++/95048
* include/bits/fs_path.h (path::_Codecvt): New class template
that selects the kind of code conversion done.
(path::_Codecvt<wchar_t>): Select based on sizeof(wchar_t).
(_GLIBCXX_CONV_FROM_UTF8): New macro to allow the same code to
be used for Windows and POSIX.
(path::_S_convert(const EcharT*, const EcharT*)): Simplify by
using _Codecvt and _GLIBCXX_CONV_FROM_UTF8 abstractions.
(path::_S_str_convert(basic_string_view<value_type>, const A&)):
Simplify nested conditions.
* include/experimental/bits/fs_path.h (path::_Cvt): Define
nested typedef controlling type of code conversion done.
(path::_Cvt::_S_wconvert): Use new typedef.
(path::string(const A&)): Likewise.
* testsuite/27_io/filesystem/path/construct/95048.cc: New test.
* testsuite/experimental/filesystem/path/construct/95048.cc: New
test.
d: Fix PR 108842: Cannot use enum array with -fno-druntime
Restrict the generating of CONST_DECLs for D manifest constants to just
scalars without pointers. It shouldn't happen that a reference to a
manifest constant has not been expanded within a function body during
codegen, but it has been found to occur in older versions of the D
front-end (PR98277), so if the decl of a non-scalar constant is
requested, just return its initializer as an expression.
PR d/108842
gcc/d/ChangeLog:
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar
manifest constants.
(get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest
constants.
* imports.cc (ImportVisitor::visit (VarDeclaration *)): New method.
gcc/testsuite/ChangeLog:
* gdc.dg/pr98277.d: Add more tests.
* gdc.dg/pr108842.d: New test.
Fix power10 fusion bug with prefixed loads, PR target/105325
This changes fixes PR target/105325. PR target/105325 is a bug where an
invalid lwa instruction is generated due to power10 fusion of a load
instruction to a GPR and an compare immediate instruction with the immediate
being -1, 0, or 1.
In some cases, when the load instruction is done, the GCC compiler would
generate a load instruction with an offset that was too large to fit into the
normal load instruction.
In particular, loads from the stack might originally have a small offset, so
that the load is not a prefixed load. However, after the stack is set up, and
register allocation has been done, the offset now is large enough that we would
have to use a prefixed load instruction.
The support for prefixed loads did not consider that patterns with a fused load
and compare might have a prefixed address. Without this support, the proper
prefixed load won't be generated.
In the original code, when the split2 pass is run after reload has finished the
ds_form_mem_operand predicate that was used for lwa and ld no longer returns
true. When the pattern was created, ds_form_mem_operand recognized the insn as
being valid since the offset was small. But after register allocation,
ds_form_mem_operand did not return true. Because it didn't return true, the
insn could not be split. Since the insn was not split and the prefix support
did not indicate a prefixed instruction was used, the wrong load is generated.
The solution involves:
1) Don't use ds_form_mem_operand for ld and lwa, always use
non_update_memory_operand.
2) Delete ds_form_mem_operand since it is no longer used.
3) Use the "YZ" constraints for ld/lwa instead of "m".
4) If we don't need to sign extend the lwa, convert it to lwz, and use
cmpwi instead of cmpdi. Adjust the insn name to reflect the code
generate.
5) Insure that the insn using lwa will be recognized as having a prefixed
operand (and hence the insn length will be 16 bytes instead of 8
bytes).
5a) Set the prefixed and maybe_prefix attributes to know that
fused_load_cmpi are also load insns;
5b) In the case where we are just setting CC and not using the memory
afterward, set the clobber to use a DI register, and put an
explicit sign_extend operation in the split;
5c) Set the sign_extend attribute to "yes" for lwa.
5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
ensure that lwa is treated as a ds-form instruction and not as
a d-form instruction (i.e. lwz).
6) Add a new test case for this case.
7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no
longer using ds_form_mem_operand, the ld and lwa instructions will fuse
x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).
2023-06-23 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/105325
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
allowed prefixed lwa to be generated.
* config/rs6000/fusion.md: Regenerate.
* config/rs6000/predicates.md (ds_form_mem_operand): Delete.
* config/rs6000/rs6000.md (prefixed attribute): Add support for load
plus compare immediate fused insns.
(maybe_prefixed): Likewise.
This makes the code more readable, more digestible, more maintainable,
more extensible. That kind of thing. It does that by pulling things
apart a bit, but also making what stays together more cohesive lumps.
The original function was a bunch of loops and early-outs, and then
quite a bit of stuff done per iteration, with the iterations essentially
independent of each other. This patch moves the stuff done for one
iteration to a new _one function.
The second big thing is the stuff printed to the .md file is done in
"here documents" now, which is a lot more readable than having to quote
and escape and double-escape pieces of text. Whitespace inside the
here-document is significant (will be printed as-is), which is a bit
awkward sometimes, or might take some getting used to, but it is also
one of the benefits of using them.
Local variables are declared at first use (or close to first use).
There also shouldn't be many at all, often you can write easier to
read and manage code by omitting to name something that is hard to name
in the first place.
Finally some things are done in more typical, more modern, and tighter
Perl style, for example REs in "if"s or "qw" for lists of constants.
d: Fix core.volatile.volatileLoad discarded if result is unused
The first pass of code generation in the D front-end splits up all
compound expressions and discards expressions that have no side effects.
This included calls to the `volatileLoad' intrinsic if its result was
not used, causing such calls to be eliminated from the program.
We already set TREE_THIS_VOLATILE on the expression, however the
tree documentation says if this bit is set in an expression, so is
TREE_SIDE_EFFECTS. So set TREE_SIDE_EFFECTS on the expression too.
This prevents any early discarding from occuring.
PR d/110516
gcc/d/ChangeLog:
* intrinsics.cc (expand_volatile_load): Set TREE_SIDE_EFFECTS on the
expanded expression.
(expand_volatile_store): Likewise.
gcc/testsuite/ChangeLog:
* gdc.dg/torture/pr110516a.d: New test.
* gdc.dg/torture/pr110516b.d: New test.
d: Fix ICE in setValue, at d/dmd/dinterpret.c:7013
Backports ICE fix from upstream. When casting null to integer or real,
instead of painting the type on the NullExp, we emplace an
IntegerExp/RealExp with the value zero. Same as when casting from
NullExp to bool.
Paul E. Murphy [Thu, 22 Jun 2023 22:53:46 +0000 (17:53 -0500)]
go: Update usage of TARGET_AIX to TARGET_AIX_OS
TARGET_AIX is defined to a non-zero value on linux and maybe other
powerpc64le targets. This leads to unexpected behavior such as
dropping the .go_export section when linking a shared library
on linux/powerpc64le.
Instead, use TARGET_AIX_OS to toggle AIX specific behavior.
Fixes golang/go#60798.
2023-06-22 Paul E. Murphy <murphyp@linux.ibm.com>
gcc/go/
* go-backend.c [TARGET_AIX]: Rename and update usage to TARGET_AIX_OS.
* go-lang.c: Likewise.
If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
it to be transformed to any other whole memory access instructions.
gcc/ChangeLog:
PR rtl-optimization/110237
* config/i386/sse.md (<avx512>_store<mode>_mask): Refine with
UNSPEC_MASKMOV.
(maskstore<mode><avx512fmaskmodelower): Ditto.
(*<avx512>_store<mode>_mask): New define_insn, it's renamed
from original <avx512>_store<mode>_mask.
liuhongt [Tue, 20 Jun 2023 07:41:00 +0000 (15:41 +0800)]
Refine maskloadmn pattern with UNSPEC_MASKLOAD.
If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent
it to be transformed to vpblendd.
Thomas Schwinge [Mon, 15 May 2023 18:00:07 +0000 (20:00 +0200)]
Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]
Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba
"Support parallel testing in libgomp, part II [PR66005]"
("..., and enable if 'flock' is available for serializing execution testing"),
where we saw:
> On my Dell Precision 7530 laptop:
>
> $ uname -srvi
> Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
> $ grep '^model name' < /proc/cpuinfo | uniq -c
> 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
> $ nvidia-smi -L
> GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
>
> ... [...]: case (c) standard configuration, no offloading
> configured, [...]
PR testsuite/66005
gcc/
* doc/install.texi: Document (optional) Perl usage for parallel
testing of libgomp.
libgomp/
* testsuite/lib/libgomp.exp: 'flock' through stdout.
* testsuite/flock: New.
* configure.ac (FLOCK): Point to that if no 'flock' available, but
'perl' is.
* configure: Regenerate.
Thomas Schwinge [Tue, 25 Apr 2023 21:53:12 +0000 (23:53 +0200)]
Support parallel testing in libgomp, part II [PR66005]
..., and enable if 'flock' is available for serializing execution testing.
Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:
$ uname -srvi
Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available. For both cases, default plus '-m32' variant.
$ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"
$ uname -srvi
Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
$ nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.
$ \time make check-target-libgomp
Case (c), baseline; roughly half of case (a) (just one variant):
Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots. That's working as expected; just noting
this as it accordingly does skew the wall time numbers.