Uros Bizjak [Sun, 18 Jun 2023 20:09:38 +0000 (22:09 +0200)]
RTL: Change return type of predicate and callback functions from int to bool
gcc/ChangeLog:
* rtl.h (*rtx_equal_p_callback_function):
Change return type from int to bool.
(rtx_equal_p): Ditto.
(*hash_rtx_callback_function): Ditto.
* rtl.cc (rtx_equal_p): Change return type from int to bool
and adjust function body accordingly.
* early-remat.cc (scratch_equal): Ditto.
* sel-sched-ir.cc (skip_unspecs_callback): Ditto.
(hash_with_unspec_callback): Ditto.
Jeff Law [Sun, 18 Jun 2023 17:25:12 +0000 (11:25 -0600)]
Fix arc assumption that insns are not re-recognized
Testing the V2 version of Manolis's fold-mem-offsets patch exposed a minor bug
in the arc backend.
The movsf_insn pattern has constraints which allow storing certain constants
to memory. reload/lra will target those alternatives under the right
circumstances. However the insn's condition requires that one of the two
operands must be a register.
Thus if a pass were to force re-recognition of the pattern we can get an
unrecognized insn failure.
This patch adjusts the conditions to more closely match movsi_insn. More
specifically it allows storing a constant into a limited set of memory
operands (as defined by the Usc constraint). movqi_insn has the same
core problem and gets the same solution.
Committed after the tester validated there are not regresisons
gcc/
* config/arc/arc.md (movqi_insn): Allow certain constants to
be stored into memory in the pattern's condition.
(movsf_insn): Similarly.
Honza [Sun, 18 Jun 2023 16:58:26 +0000 (18:58 +0200)]
Analyze SRA candidates in ipa-fnsummary
this patch extends ipa-fnsummary to anticipate statements that will be removed
by SRA. This is done by looking for calls passing addresses of automatic
variables. In function body we look for dereferences from pointers of such
variables and mark them with new not_sra_candidate condition.
This is just first step which is overly optimistic. We do not try to prove that
given automatic variable will not be SRAed even after inlining. We now also
optimistically assume that the transformation will always happen. I will restrict
this in a followup patch, but I think it is useful to gether some data on how
much code is affected by this.
This is motivated by PR109849 where we fail to fully inline push_back.
The patch alone does not solve the problem even for -O3, but improves
analysis in this case.
gcc/ChangeLog:
PR tree-optimization/109849
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Add new parameter
ES; handle ipa_predicate::not_sra_candidate.
(evaluate_properties_for_edge): Pass es to
evaluate_conditions_for_known_args.
(ipa_fn_summary_t::duplicate): Handle sra candidates.
(dump_ipa_call_summary): Dump points_to_possible_sra_candidate.
(load_or_store_of_ptr_parameter): New function.
(points_to_possible_sra_candidate_p): New function.
(analyze_function_body): Initialize points_to_possible_sra_candidate;
determine sra predicates.
(estimate_ipcp_clone_size_and_time): Update call of
evaluate_conditions_for_known_args.
(remap_edge_params): Update points_to_possible_sra_candidate.
(read_ipa_call_summary): Stream points_to_possible_sra_candidate
(write_ipa_call_summary): Likewise.
* ipa-predicate.cc (ipa_predicate::add_clause): Handle not_sra_candidate.
(dump_condition): Dump it.
* ipa-predicate.h (struct inline_param_summary): Add
points_to_possible_sra_candidate.
Roger Sayle [Sun, 18 Jun 2023 16:39:13 +0000 (17:39 +0100)]
i386: Refactor new ix86_expand_carry to set the carry flag.
This patch refactors the three places in the i386.md backend that we
set the carry flag into a new ix86_expand_carry helper function, that
allows Jakub's recently added uaddc<mode>5 and usubc<mode>5 expanders
to take advantage of the recently added support for the stc instruction.
2023-06-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_carry): New helper
function for setting the carry flag.
(ix86_expand_builtin) <handlecarry>: Use it here.
* config/i386/i386-protos.h (ix86_expand_carry): Prototype here.
* config/i386/i386.md (uaddc<mode>5): Use ix86_expand_carry.
(usubc<mode>5): Likewise.
* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.
[contrib] validate_failures.py: Don't consider summary line in wrong place
When parsing a summary or manifest file, if we're not either after a tool
line (e.g. "=== gdb tests ===") or before a summary line (e.g.,
"=== gdb Summary ===") then the current line can't be a valid result line
so ignore it.
This addresses a problem we're seeing when running the GDB testsuite in
our CI environment where it produces a valid summary file, but then after
the "=== gdb Summary ===" section it outputs a series of Tcl errors that
match _VALID_TEST_RESULTS_REX and thus confuse the parsing logic:
05: 14:32 .sum file seems to be broken: tool="None", exp="None", summary_line="ERROR: -------------------------------------------"
05: 14:32 Traceback (most recent call last):
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 706, in <module>
05: 14:32 retval = Main(sys.argv)
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 697, in Main
05: 14:32 retval = CheckExpectedResults()
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 572, in CheckExpectedResults
05: 14:32 actual = GetResults(sum_files)
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 447, in GetResults
05: 14:32 build_results.update(ParseSummary(sum_fname))
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 389, in ParseSummary
05: 14:32 result = result_set.MakeTestResult(line, ordinal)
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 236, in MakeTestResult
05: 14:32 return TestResult(summary_line, ordinal,
05: 14:32 File "/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 148, in __init__
05: 14:32 raise
contrib/ChangeLog:
* testsuite-management/validate_failures.py (IsInterestingResult):
Add result_set argument and use it. Adjust callers.
Roger Sayle [Sat, 17 Jun 2023 21:28:40 +0000 (22:28 +0100)]
i386: Two minor tweaks to ix86_expand_move.
This patch splits out two (independent) minor changes to i386-expand.cc's
ix86_expand_move from a larger patch, given that it's better to review
and commit these independent pieces separately from a more complex patch.
The first change is to test for CONST_WIDE_INT_P before calling
ix86_convert_const_wide_int_to_broadcast. Whilst stepping through
this function in gdb, I was surprised that the code was continually
jumping into this function with operands that obviously weren't
appropriate.
The second change is to generalize the optimization for efficiently
moving a TImode value to V1TImode (via V2DImode), to cover all 128-bit
vector modes.
The new test case is unimaginatively called sse2-v1ti-mov-2.c given
the original test case just for V1TI mode was called sse2-v1ti-mov-1.c.
2023-06-17 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Check that OP1 is
CONST_WIDE_INT_P before calling ix86_convert_wide_int_to_broadcast.
Generalize special case for converting TImode to V1TImode to handle
all 128-bit vector conversions.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-2.c: New test case.
Preparatory refactoring that simplifies by eliminating
some duplicated code, before trying to fix 77576.
I believe this stands on its own regardless of the PR.
It also saves a nargv element when we have a plugin and
three when not.
gcc/
* gcc-ar.cc (main): Refactor to slightly reduce code
duplication. Avoid unnecessary elements in nargv.
Pan Li [Fri, 16 Jun 2023 07:01:46 +0000 (15:01 +0800)]
RISC-V: Bugfix for RVV integer reduction in ZVE32/64.
The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi; // ZVE64
if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi; // ZVE32
}
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32
will be returned as expectation.
Please note both GCC 13 and 14 are impacted by this issue.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
PR target/110265
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md
(@pred_reduc_<reduc><mode><vlmul1>): Removed.
(@pred_reduc_<reduc><mode><vlmul1_zve64>): Likewise.
(@pred_reduc_<reduc><mode><vlmul1_zve32>): Likewise.
(@pred_reduc_<reduc><VQI:mode><VQI_LMUL1:mode>): New pattern.
(@pred_reduc_<reduc><VHI:mode><VHI_LMUL1:mode>): Likewise.
(@pred_reduc_<reduc><VSI:mode><VSI_LMUL1:mode>): Likewise.
(@pred_reduc_<reduc><VDI:mode><VDI_LMUL1:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
Ian Lance Taylor [Fri, 16 Jun 2023 17:45:15 +0000 (10:45 -0700)]
libgo/testsuite: add benchmarks and examples to list
In CL 384695 I simplified the code that built lists of benchmarks,
examples, and fuzz tests, and managed to break it. This CL corrects
the code to once again make the benchmarks available, and to run
the examples with output and the fuzz targets.
Doing this revealed a test failure in internal/fuzz on 32-bit x86:
a signalling NaN is turned into a quiet NaN on the 387 floating-point
stack that GCC uses by default. This CL skips the test.
Jakub Jelinek [Fri, 16 Jun 2023 17:47:28 +0000 (19:47 +0200)]
uiltins: Add support for clang compatible __builtin_{add,sub}c{,l,ll} [PR79173]
While the design of these builtins in clang is questionable,
rather than being say
unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
so that it is clear they add two [0, 0xffffffff] range numbers
plus one [0, 1] range carry in and give [0, 0xffffffff] range
return plus [0, 1] range carry out, they actually instead
add 3 [0, 0xffffffff] values together but the carry out
isn't then the expected [0, 2] value because
0xffffffffULL + 0xffffffff + 0xffffffff is 0x2fffffffd,
but just [0, 1] whether there was any overflow at all.
It is something used in the wild and shorter to write than the
corresponding
#define __builtin_addc(a,b,carry_in,carry_out) \
({ unsigned _s; \
unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
*(carry_out) = (_c1 | _c2); \
_s; })
and so a canned builtin for something people could often use.
It isn't that hard to maintain on the GCC side, as we just lower
it to two .ADD_OVERFLOW calls early, and the already committed
pottern recognization code can then make .UADDC/.USUBC calls out of
that if the carry in is in [0, 1] range and the corresponding
optab is supported by the target.
Jakub Jelinek [Fri, 16 Jun 2023 17:46:36 +0000 (19:46 +0200)]
tree-ssa-math-opts: Fix up uaddc/usubc pattern matching [PR110271]
The following testcase ICEs, because I misremembered what the return value
from match_arith_overflow is. It isn't true if __builtin_*_overflow was
matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed.
So, if match_arith_overflow matches something, gsi_stmt (gsi) will not
be stmt and match_uaddc_usubc will be confused and can ICE.
The following patch fixes it by checking if gsi_stmt (gsi) == stmt,
in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to
pattern match it further as UADDC/USUBC.
2023-06-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/110271
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
<case PLUS_EXPR>: Ignore return value from match_arith_overflow,
instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.
Martin Jambor [Fri, 16 Jun 2023 16:10:21 +0000 (18:10 +0200)]
Regenerate some autotools generated files
As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621976.html this
should put the autotools generated files in sync to what they were
generated from (and make an automated checker happy).
Tested by bootstrapping on top of only a few revisions ago.
Tobias Burnus [Fri, 16 Jun 2023 15:21:59 +0000 (17:21 +0200)]
libgomp: Fix OMP_TARGET_OFFLOAD=mandatory
It turned out that gomp_init_targets_once() was not run when directly
calling 'omp target' or 'omp target (enter/exit) data' causing an
abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no
device is available. It was called a tiny bit later but few lines too
late for updating the default-device-var.
libgomp/ChangeLog:
* target.c (resolve_device): Call gomp_get_num_devices early to ensure
gomp_init_targets_once was called before using default-device-var.
* testsuite/libgomp.c/target-55.c: New test.
* testsuite/libgomp.c/target-55a.c: New test.
Roger Sayle [Fri, 16 Jun 2023 15:18:27 +0000 (16:18 +0100)]
PR target/31985: Improve memory operand use with doubleword add.
This patch addresses the last remaining issue with PR target/31985, that
GCC could make better use of memory addressing modes when implementing
double word addition. This is achieved by adding a define_insn_and_split
that combines an *add<dwi>3_doubleword with a *concat<mode><dwi>3, so
that the components of the concat can be used directly, without first
being loaded into a double word register.
2023-06-16 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/31985
* config/i386/i386.md (*add<dwi>3_doubleword_concat): New
define_insn_and_split combine *add<dwi>3_doubleword with
a *concat<mode><dwi>3 for more efficient lowering after reload.
gcc/testsuite/ChangeLog
PR target/31985
* gcc.target/i386/pr31985.c: New test case.
RA: Ignore conflicts for some pseudos from insns throwing a final exception
IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code. This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).
The following patch fixes the problem.
PR rtl-optimization/110215
gcc/ChangeLog:
* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.
Alex Coplan [Fri, 16 Jun 2023 14:18:40 +0000 (15:18 +0100)]
c++: Accept elaborated-enum-base with pedwarn
macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:
typedef enum T : BaseType T;
i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.
This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.
The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html
GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.
With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html
Kyrylo Tkachov [Wed, 7 Jun 2023 14:24:46 +0000 (15:24 +0100)]
aarch64: Handle ASHIFTRT in patterns for shrn2
Similar to the low-half patterns, we want to match both ashiftrt and
lshiftrt with the truncate for SHRN2. We reuse the SHIFTRT iterator
and the AARCH64_VALID_SHRN_OP check to help, but because we expand the
high-half patterns by their gen_* names we need to disambiguate all the
different trunc+shift combinations in the pattern name, which leads to a
slight renaming of the builtins. The AARCH64_VALID_SHRN_OP check on the
expander and the define_insns ensures that no invalid combination ends
up getting matched.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
Kyrylo Tkachov [Wed, 7 Jun 2023 10:20:01 +0000 (11:20 +0100)]
aarch64: [US]Q(R)SHR(U)N2 refactoring
This patch is large in lines of code, but it is a fairly regular
extension of the first patch as it converts the high-half patterns
to standard RTL codes in the same fashion as the first patch did for the
low-half ones.
This now allows us to remove the unspec codes for these instructions as
there are no more uses of them left.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd-builtins.def (shrn2): Rename builtins to...
(shrn2_n): ... This.
(rshrn2): Rename builtins to...
(rshrn2_n): ... This.
* config/aarch64/arm_neon.h (vrshrn_high_n_s16): Adjust for the above.
(vrshrn_high_n_s32): Likewise.
(vrshrn_high_n_s64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vshrn_high_n_s16): Likewise.
(vshrn_high_n_s32): Likewise.
(vshrn_high_n_s64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_<srn_op>shrn<mode>2_vect_le):
Delete.
(*aarch64_<srn_op>shrn<mode>2_vect_be): Likewise.
(aarch64_shrn2<mode>_insn_le): Likewise.
(aarch64_shrn2<mode>_insn_be): Likewise.
(aarch64_shrn2<mode>): Likewise.
(aarch64_rshrn2<mode>_insn_le): Likewise.
(aarch64_rshrn2<mode>_insn_be): Likewise.
(aarch64_rshrn2<mode>): Likewise.
(aarch64_<sur>q<r>shr<u>n2_n<mode>_insn_le): Likewise.
(aarch64_<shrn_op>shrn2_n<mode>_insn_le): New define_insn.
(aarch64_<sur>q<r>shr<u>n2_n<mode>_insn_be): Delete.
(aarch64_<shrn_op>shrn2_n<mode>_insn_be): New define_insn.
(aarch64_<sur>q<r>shr<u>n2_n<mode>): Delete.
(aarch64_<shrn_op>shrn2_n<mode>): New define_expand.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_le): New define_insn.
(aarch64_<shrn_op>rshrn2_n<mode>_insn_be): New define_insn.
(aarch64_<shrn_op>rshrn2_n<mode>): New define_expand.
(aarch64_sqshrun2_n<mode>_insn_le): New define_insn.
(aarch64_sqshrun2_n<mode>_insn_be): New define_insn.
(aarch64_sqshrun2_n<mode>): New define_expand.
(aarch64_sqrshrun2_n<mode>_insn_le): New define_insn.
(aarch64_sqrshrun2_n<mode>_insn_be): New define_insn.
(aarch64_sqrshrun2_n<mode>): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQSHRUN, UNSPEC_SQRSHRUN,
UNSPEC_SQSHRN, UNSPEC_UQSHRN, UNSPEC_SQRSHRN, UNSPEC_UQRSHRN):
Delete unspec values.
(VQSHRN_N): Delete int iterator.
Kyrylo Tkachov [Tue, 6 Jun 2023 22:42:48 +0000 (23:42 +0100)]
aarch64: Add ASHIFTRT handling for shrn pattern
The first patch in the series has some fallout in the testsuite,
particularly gcc.target/aarch64/shrn-combine-2.c.
Our previous patterns for SHRN matched both
(truncate (ashiftrt (x) (N))) and (truncate (lshiftrt (x) (N))
as these are equivalent for the shift amounts involved.
In our refactoring, however, we mapped shrn to truncate+lshiftrt.
The fix here is to iterate over ashiftrt,lshiftrt in the pattern for it.
However, we don't want to allow ashiftrt for us_truncate or lshiftrt for
ss_truncate from the ALL_TRUNC iterator.
This patch addds a AARCH64_VALID_SHRN_OP helper to gate the valid
combinations of truncations and shifts.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64.h (AARCH64_VALID_SHRN_OP): Define.
* config/aarch64/aarch64-simd.md
(*aarch64_<shrn_op>shrn_n<mode>_insn<vczle><vczbe>): Rename to...
(*aarch64_<shrn_op><shrn_s>shrn_n<mode>_insn<vczle><vczbe>): ... This.
Use SHIFTRT iterator and add AARCH64_VALID_SHRN_OP to condition.
* config/aarch64/iterators.md (shrn_s): New code attribute.
Kyrylo Tkachov [Tue, 6 Jun 2023 22:35:52 +0000 (23:35 +0100)]
aarch64: [US]Q(R)SHR(U)N scalar forms refactoring
Some instructions from the previous patch have scalar forms:
SQSHRN,SQRSHRN,UQSHRN,UQRSHRN,SQSHRUN,SQRSHRUN.
This patch converts the patterns for these to use standard RTL codes.
Their MD patterns deviate slightly from the vector forms mostly due to
things like operands being scalar rather than vectors.
One nuance is in the SQSHRUN,SQRSHRUN patterns. These end in a truncate
to the scalar narrow mode e.g. SI -> QI. This gets simplified by the
RTL passes to a subreg rather than keeping it as a truncate.
So we end up representing these without the truncate and in the expander
read the narrow subreg in order to comply with the expected width of the
intrinsic.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>q<r>shr<u>n_n<mode>):
Rename to...
(aarch64_<shrn_op>shrn_n<mode>): ... This. Reimplement with RTL codes.
(*aarch64_<shrn_op>rshrn_n<mode>_insn): New define_insn.
(aarch64_sqrshrun_n<mode>_insn): Likewise.
(aarch64_sqshrun_n<mode>_insn): Likewise.
(aarch64_<shrn_op>rshrn_n<mode>): New define_expand.
(aarch64_sqshrun_n<mode>): Likewise.
(aarch64_sqrshrun_n<mode>): Likewise.
* config/aarch64/iterators.md (V2XWIDE): Add HI and SI modes.
Kyrylo Tkachov [Tue, 6 Jun 2023 21:37:46 +0000 (22:37 +0100)]
aarch64: Reimplement [US]Q(R)SHR(U)N patterns with RTL codes
This patch reimplements the MD patterns for the instructions that
perform narrowing right shifts with optional rounding and saturation
using standard RTL codes rather than unspecs.
There are four groups of patterns involved:
* Simple narrowing shifts with optional signed or unsigned truncation:
SHRN, SQSHRN, UQSHRN. These are expressed as a truncation operation of
a right shift. The matrix of valid combinations looks like this:
| ashiftrt | lshiftrt |
------------------------------------------
ss_truncate | SQSHRN | X |
us_truncate | X | UQSHRN |
truncate | X | SHRN |
------------------------------------------
* Narrowing shifts with rounding with optional signed or unsigned
truncation: RSHRN, SQRSHRN, UQRSHRN. These follow the same
combinations of truncation and shift codes as above, but also perform
intermediate widening of the results in order to represent the addition
of the rounding constant. This group also corrects an existing
inaccuracy for RSHRN where we don't currently model the intermediate
widening for rounding.
* The somewhat special "Signed saturating Shift Right Unsigned Narrow":
SQSHRUN. Similar to the SQXTUN instructions, these perform a
saturating truncation that isn't represented by US_TRUNCATE or
SS_TRUNCATE but needs to use a clamping operation followed by a
TRUNCATE.
* The rounding version of the above: SQRSHRUN. It needs the special
clamping truncate representation but with an intermediate widening and
rounding addition.
Besides using standard RTL codes for all of the above instructions, this
patch allows us to get rid of the explicit define_insns and
define_expands for SHRN and RSHRN.
Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf. We've got pretty thorough execute tests in
advsimd-intrinsics.exp that exercise these and many instances of these
instructions get constant-folded away during optimisation and the
validation still passes (during development where I was figuring out the
details of the semantics they were discovering failures), so I'm fairly
confident in the representation.
Simon Dardis [Tue, 6 Jun 2023 06:53:36 +0000 (14:53 +0800)]
MIPS16: Implement `code_readable` function attribute.
Support for __attribute__ ((code_readable)). Takes up to one argument of
"yes", "no", "pcrel". This will change the code readability setting for just
that function. If no argument is supplied, then the setting is 'yes'.
* gcc.target/mips/code-readable-attr-1.c: New test.
* gcc.target/mips/code-readable-attr-2.c: New test.
* gcc.target/mips/code-readable-attr-3.c: New test.
* gcc.target/mips/code-readable-attr-4.c: New test.
* gcc.target/mips/code-readable-attr-5.c: New test.
YunQiang Su [Mon, 5 Jun 2023 01:36:01 +0000 (09:36 +0800)]
MAINTAINERS: move Matthew Fortune to Write After Approval
In 4fe6e12204535545edf7f035d4dc79c1404058cf, I should have added
Matthew Fortune to the Write After Approval section, while replacing
the MIPS Maintainer position.
ChangeLog:
* MAINTAINERS (Write After Approval): move Matthew Fortune
to Write After Approval.
Alexandre Oliva [Fri, 16 Jun 2023 06:23:47 +0000 (03:23 -0300)]
[libstdc++] [testsuite] xfail dbl from_chars for aarch64 rtems ldbl
rtems, like vxworks, uses fast-float doubles for from_chars even for
long double, so it loses precision, so expect the long double bits to
fail on aarch64.
for libstdc++-v3/ChangeLog
* testsuite/20_util/from_chars/4.cc: Skip long double on
aarch64-rtems.
Joel Brobecker [Fri, 16 Jun 2023 06:23:44 +0000 (03:23 -0300)]
libstdc++-v3: do not duplicate some math functions when using newlib
When running the libstdc++ testsuite on AArch64 RTEMS, we noticed
that about 25 tests are failing during the link, due to the "sqrtl"
function being defined twice:
- once inside RTEMS' libm;
- once inside our libstdc++.
One test that fails, for instance, would be 26_numerics/complex/13450.cc.
In comparing libm and libstdc++, we found that libstc++ also
duplicates "hypotf", and "hypotl".
For "sqrtl" and "hypotl", the symbosl come a unit called
from math_stubs_long_double.cc, while "hypotf" comes from
the equivalent unit for the float version, called math_stubs_float.cc.
Those units are always compiled in libstdc++ and provide our own
version of various math routines when those are missing from
the target system. The definition of those symbols is predicated
on the existance of various macros provided by c++config.h, which
themselves are predicated by the corresponding HAVE_xxx macros
in config.h.
One key element behind what's happening, here, is that the target
uses newlib, and therefore GCC was configured --with-newlib.
The section of libstdc++v3's configure script that handles which math
functions are available has a newlib-specific section, and that
section provides a hardcoded list of symbols.
For "hypotf", this commit fixes the issue by doing the same
as for the other routines already declared in that section.
I verified by inspection in the newlib code that this function
should always be present, so hardcoding it in our configure
script should not be an issue.
For the math routines handling doubles ("sqrtl" and "hypotl"),
however, I do not believe we can assume that newlib's libm
will always provide them. Therefore, this commit fixes that
part of the issue by ading a compile-check for "sqrtl" and "hypotl".
And while at it, we also include checks for all the other math
functions that math_stubs_long_double.cc re-implements, allowing
us to be resilient to future newlib enhancements adding support
for more functions.
libstdc++-v3/ChangeLog:
* configure.ac ["x${with_newlib}" = "xyes"]: Define
HAVE_HYPOTF. Add compile-checks for various long double
math functions as well.
* configure: Regenerate.
Marek Polacek [Wed, 3 May 2023 21:06:13 +0000 (17:06 -0400)]
configure: Implement --enable-host-pie
[ This is my third attempt to add this configure option. The first
version was approved but it came too late in the development cycle.
The second version was also approved, but I had to revert it:
<https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607082.html>.
I've fixed the problem (by moving $(PICFLAG) from INTERNAL_CFLAGS to
ALL_COMPILERFLAGS). Another change is that since r13-4536 I no longer
need to touch Makefile.def, so this patch is simplified. ]
This patch implements the --enable-host-pie configure option which
makes the compiler executables PIE. This can be used to enhance
protection against ROP attacks, and can be viewed as part of a wider
trend to harden binaries.
It is similar to the option --enable-host-shared, except that --e-h-s
won't add -shared to the linker flags whereas --e-h-p will add -pie.
It is different from --enable-default-pie because that option just
adds an implicit -fPIE/-pie when the compiler is invoked, but the
compiler itself isn't PIE.
Since r12-5768-gfe7c3ecf, PCH works well with PIE, so there are no PCH
regressions.
When building the compiler, the build process may use various in-tree
libraries; these need to be built with -fPIE so that it's possible to
use them when building a PIE. For instance, when --with-included-gettext
is in effect, intl object files must be compiled with -fPIE. Similarly,
when building in-tree gmp, isl, mpfr and mpc, they must be compiled with
-fPIE.
With this patch and --enable-host-pie used to configure gcc:
$ file gcc/cc1{,plus,obj,gm2} gcc/f951 gcc/lto1 gcc/cpp gcc/go1 gcc/rust1 gcc/gnat1
gcc/cc1: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=98e22cde129d304aa6f33e61b1c39e144aeb135e, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/cc1plus: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=859d1ea37e43dfe50c18fd4e3dd9a34bb1db8f77, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/cc1obj: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1964f8ecee6163182bc26134e2ac1f324816e434, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/cc1gm2: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=a396672c7ff913d21855829202e7b02ecf42ff4c, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/f951: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=59c523db893186547ac75c7a71f48be0a461c06b, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/lto1: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=084a7b77df7be2d63c2d4c655b5bbc3fcdb6038d, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/cpp: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=3503bf8390d219a10d6653b8560aa21158132168, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/go1: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=988cc673af4fba5dcb482f4b34957b99050a68c5, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/rust1: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=b6a5d3d514446c4dcdee0707f086ab9b274a8a3c, for GNU/Linux 3.2.0, with debug_info, not stripped
gcc/gnat1: ELF 64-bit LSB pie executable, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=bb11ccdc2c366fe3fe0980476bcd8ca19b67f9dc, for GNU/Linux 3.2.0, with debug_info, not stripped
I plan to add an option to link with -Wl,-z,now.
Bootstrapped on x86_64-pc-linux-gnu with --with-included-gettext
--enable-host-pie as well as without --enable-host-pie. Also tested
on a Debian system where the system gcc was configured with
--enable-default-pie.
Co-Authored by: Iain Sandoe <iain@sandoe.co.uk>
ChangeLog:
* configure.ac (--enable-host-pie): New check. Set PICFLAG after this
check.
* configure: Regenerate.
c++tools/ChangeLog:
* Makefile.in: Rename PIEFLAG to PICFLAG. Set LD_PICFLAG. Use it.
Use pic/libiberty.a if PICFLAG is set.
* configure.ac (--enable-default-pie): Set PICFLAG instead of PIEFLAG.
(--enable-host-pie): New check.
* configure: Regenerate.
fixincludes/ChangeLog:
* Makefile.in: Set and use PICFLAG and LD_PICFLAG. Use the "pic"
build of libiberty if PICFLAG is set.
* configure.ac:
* configure: Regenerate.
gcc/ChangeLog:
* Makefile.in: Set LD_PICFLAG. Use it. Set enable_host_pie.
Remove NO_PIE_CFLAGS and NO_PIE_FLAG. Pass LD_PICFLAG to
ALL_LINKERFLAGS. Use the "pic" build of libiberty if --enable-host-pie.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check. Set PICFLAG and LD_PICFLAG after this
check.
* configure: Regenerate.
* doc/install.texi: Document --enable-host-pie.
gcc/ada/ChangeLog:
* gcc-interface/Make-lang.in (ALL_ADAFLAGS): Remove NO_PIE_CFLAGS. Add
PICFLAG. Use PICFLAG when building ada/b_gnat1.o and ada/b_gnatb.o.
* gcc-interface/Makefile.in: Use pic/libiberty.a if PICFLAG is set.
Remove NO_PIE_FLAG.
gcc/m2/ChangeLog:
* Make-lang.in: New var, GM2_PICFLAGS. Use it.
gcc/d/ChangeLog:
* Make-lang.in: Remove NO_PIE_CFLAGS.
intl/ChangeLog:
* Makefile.in: Use @PICFLAG@ in COMPILE as well.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check. Set PICFLAG after this check.
* configure: Regenerate.
libcody/ChangeLog:
* Makefile.in: Pass LD_PICFLAG to LDFLAGS.
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check. Set PICFLAG and LD_PICFLAG after this
check.
* configure: Regenerate.
libcpp/ChangeLog:
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check. Set PICFLAG after this check.
* configure: Regenerate.
libdecnumber/ChangeLog:
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check. Set PICFLAG after this check.
* configure: Regenerate.
libiberty/ChangeLog:
* configure.ac: Also set shared when enable_host_pie.
* configure: Regenerate.
zlib/ChangeLog:
* configure.ac (--enable-host-shared): Don't set PICFLAG here.
(--enable-host-pie): New check. Set PICFLAG after this check.
* configure: Regenerate.
Manolis Tsamis [Thu, 25 May 2023 11:44:41 +0000 (13:44 +0200)]
cprop_hardreg: Enable propagation of the stack pointer if possible
Propagation of the stack pointer in cprop_hardreg is currenty
forbidden in all cases, due to maybe_mode_change returning NULL.
Relax this restriction and allow propagation when no mode change is
requested.
Andrew Pinski [Thu, 15 Jun 2023 17:49:40 +0000 (17:49 +0000)]
Add another testcase for PR 110266
Since the combining of sin/cos into cexpi is depedent
on the target, this adds another testcase which had failed (earlier in
evpr rather than vrp2) that will fail on all targets rather than
ones which have sincos or C99 math functions.
Committed as obvious after a quick test.
gcc/testsuite/ChangeLog:
PR tree-optimization/110266
* gcc.c-torture/compile/pr110266.c: New test.
Andrew MacLeod [Thu, 15 Jun 2023 15:59:55 +0000 (11:59 -0400)]
Check for integer only complex.
With the expanded capabilities of range-op dispatch, floating point
complex objects can appear when folding, whic they couldn't before.
In the processig for extracting integers from complex ints, make sure it
is an integer complex.
Jakub Jelinek [Thu, 15 Jun 2023 12:16:17 +0000 (14:16 +0200)]
libcpp: Diagnose #include after failed __has_include [PR80753]
As can be seen in the testcase, we don't diagnose #include/#include_next
of a non-existent header if __has_include/__has_include_next is done for
that header first.
The problem is that we normally error the first time some header is not
found, but in the _cpp_FFK_HAS_INCLUDE case obviously don't want to diagnose
it, just expand it to 0. And libcpp caches both successful includes and
unsuccessful ones.
The following patch fixes that by remembering that we haven't diagnosed
error when using __has_include* on it, and diagnosing it when using the
cache entry in normal mode the first time.
I think _cpp_FFK_NORMAL is the only mode in which we normally diagnose
errors, for _cpp_FFK_PRE_INCLUDE that open_file_failed isn't reached
and for _cpp_FFK_FAKE neither.
2023-06-15 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/80753
libcpp/
* files.cc (struct _cpp_file): Add deferred_error bitfield.
(_cpp_find_file): When finding a file in cache with deferred_error
set in _cpp_FFK_NORMAL mode, call open_file_failed and clear the flag.
Set deferred_error in _cpp_FFK_HAS_INCLUDE mode if open_file_failed
hasn't been called.
gcc/testsuite/
* c-c++-common/missing-header-5.c: New test.
Tobias Burnus [Thu, 15 Jun 2023 10:55:58 +0000 (12:55 +0200)]
libgomp: Extend OMP_ALLOCATOR, add affinity env var doc
Support OpenMP 5.1's syntax for OMP_ALLOCATOR as well,
which permits besides predefined allocators also
predefined memspaces optionally followed by traits.
Additionally, this commit adds the previously lacking
documentation for OMP_ALLOCATOR, OMP_AFFINITY_FORMAT
and OMP_DISPLAY_AFFINITY.
libgomp/ChangeLog:
* env.c (gomp_def_allocator_envvar): New var.
(parse_allocator): Handle OpenMP 5.1 syntax.
(cleanup_env): New.
(omp_display_env): Output gomp_def_allocator_envvar
for an allocator with traits.
* libgomp.texi (OMP_ALLOCATOR, OMP_AFFINITY_FORMAT,
OMP_DISPLAY_AFFINITY): New.
* testsuite/libgomp.c/allocator-1.c: New test.
* testsuite/libgomp.c/allocator-2.c: New test.
* testsuite/libgomp.c/allocator-3.c: New test.
* testsuite/libgomp.c/allocator-4.c: New test.
* testsuite/libgomp.c/allocator-5.c: New test.
* testsuite/libgomp.c/allocator-6.c: New test.
Jan Beulich [Thu, 15 Jun 2023 08:52:35 +0000 (10:52 +0200)]
x86/AVX512: use VMOVDDUP for broadcast to V2DF
Like is already the case for the AVX/AVX2 form, VMOVDDUP - acting on
double precision floating values - is more appropriate to use here, and
it can also result in shorter insn encodings when source is memory or
%xmm0...%xmm7, and no masking is applied (in allowing a 2-byte VEX
prefix then instead of a 3-byte one).
gcc/
* config/i386/sse.md (<avx512>_vec_dup<mode><mask_name>): Use
vmovddup.
Lulu Cheng [Wed, 7 Jun 2023 02:21:58 +0000 (10:21 +0800)]
LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]
Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.
Therefore, a problem like PR110136 can cause a significant increase in branch error
prediction rate and affect performance. The same problem exists with "indirect_jump".
gcc/ChangeLog:
PR target/110136
* config/loongarch/loongarch.md: Modify the register constraints for template
"jumptable" and "indirect_jump" from "r" to "e".
Co-authored-by: Andrew Pinski <apinski@marvell.com>
Eric Botcazou [Wed, 10 May 2023 16:00:36 +0000 (18:00 +0200)]
ada: Fix wrong finalization for double subtype of bounded vector
The special handling of temporaries created for return values and subject
to a renaming needs to be restricted to the top level, where it is needed
to prevent dangling references to the frame of the elaboration routine from
being created, because, at a lower level, the front-end may create implicit
renamings of objects as these temporaries, so a copy is not allowed.
gcc/ada/
* gcc-interface/decl.cc (gnat_to_gnu_entity) <E_Variable>: Restrict
the special handling of temporaries created for return values and
subject to a renaming to the top level.
Eric Botcazou [Fri, 19 May 2023 23:23:20 +0000 (01:23 +0200)]
ada: Fix too small secondary stack allocation for returned conversion
The previous fix did not address a latent issue whereby the allocation
would be made using the (static) subtype of the conversion instead of
the (dynamic) subtype of the return object, so this change rewrites the
code responsible for determining the type used for the allocation, and
also contains a small improvement to the Has_Tag_Of_Type predicate.
gcc/ada/
* exp_ch3.adb (Make_Allocator_For_Return): Rewrite the logic that
determines the type used for the allocation and add assertions.
* exp_util.adb (Has_Tag_Of_Type): Also return true for extension
aggregates.
Eric Botcazou [Wed, 17 May 2023 15:05:14 +0000 (17:05 +0200)]
ada: Fix internal error on loop iterator filter with -gnatVa
The problem is that the condition of the iterator filter is expanded early,
before it is integrated into an if statement of the loop body, so there is
no place to attach the actions generated by this expansion.
This happens only for simple loops, i.e. with a parameter specification, so
the fix uses the same approach for them as for loops based on iterators.
gcc/ada/
* sinfo.ads (Iterator_Filter): Document field.
* sem_ch5.adb (Analyze_Iterator_Specification): Move comment around.
(Analyze_Loop_Parameter_Specification): Only preanalyze the iterator
filter, if any.
* exp_ch5.adb (Expand_N_Loop_Statement): Analyze the new list built
when an iterator filter is present.
Eric Botcazou [Tue, 16 May 2023 09:35:23 +0000 (11:35 +0200)]
ada: Revert latest change to Find_Hook_Context
The issue is that, if an aggregate is both below a conditional expression
and above another conditional expression in the tree, we have currently no
place to put the finalization actions generated by the innermost expression
in the context of the aggregate before it is expanded, so they end up being
placed after the outermost expression.
But it is not clear whether that's really problematic because this does not
seem to happen for array aggregates with multiple or others choices: in this
case the aggregate is expanded first and the code path is not taken.
Eric Botcazou [Sat, 13 May 2023 08:55:44 +0000 (10:55 +0200)]
ada: Fix too small secondary stack allocation for returned aggregate
This restores the specific treatment of aggregates that are returned through
an extended return statement in a function returning a class-wide type, and
which was incorrectly dropped in an earlier change.
gcc/ada/
* exp_ch3.adb (Make_Allocator_For_Return): Deal again specifically
with an aggregate returned through an object of a class-wide type.
Eric Botcazou [Sun, 14 May 2023 22:07:01 +0000 (00:07 +0200)]
ada: Remove dead code in Expand_Iterator_Loop_Over_Container
The Condition_Actions field can only be populated for while loops.
gcc/ada/
* exp_ch5.adb (Expand_Iterator_Loop_Over_Container): Do not insert
an always empty list. Remove unused parameter Isc.
(Expand_Iterator_Loop): Adjust call to above procedure.
Before this patch, the fact that Restrictions pragmas had to fit on
a single line in system.ads was difficult to reconcile with the
80-character line limit that is enforced in that file.
The special rules for pragmas in system.ads made it impossible to us
the Style_Checks pragma to allow long Restrictions pragmas. This patch
relaxes those rules so the Style_Checks pragma can be used in
system.ads.
gcc/ada/
* targparm.adb: Allow pragma Style_Checks in some forms.
* targparm.ads: Document new pragma permission.
Eric Botcazou [Sun, 14 May 2023 09:49:09 +0000 (11:49 +0200)]
ada: Fix missing finalization for aggregates nested in conditional expressions
The finalization actions for the components of the aggregates are blocked
by Expand_Ctrl_Function_Call, which sets Is_Ignored_Transient on all the
temporaries generated from within a conditional expression whatever the
intermediate constructs. Now aggregates and their expansion in the form
of block and loop statements are "impenetrable" as far as temporaries are
concerned, i.e. the lifetime of temporaries generated within them does
not extend beyond them, so their finalization must not be blocked there.
gcc/ada/
* exp_util.ads (Within_Case_Or_If_Expression): Adjust description.
* exp_util.adb (Find_Hook_Context): Stop the search for the topmost
conditional expression, if within one, at contexts where temporaries
may be contained.
(Within_Case_Or_If_Expression): Return false upon first encoutering
contexts where temporaries may be contained.
ada: Adjust QNX Ada priorities to match QNX system priorities
The Ada priority range of the QNX runtime started from 0, differing from
the QNX system priorities range starting from 1. As this may cause
confusion, especially if used in a mixed language environment, the Ada
priority range now starts at 1.
The default priority of Ada tasks as mandated is the middle of the
priority range. On QNX this means the default priority of Ada tasks is
30. This is much higher than the default QNX priority of 10 and may
cause unexpected system interruptions when Ada tasks take a lot of CPU time.
gcc/ada/
* libgnarl/s-osinte__qnx.adb: Adjust priority conversion function.
* libgnat/system-qnx-arm.ads: Adjust priority range and default
priority.
This patch removes a few dangling references to the late front-end
implementation of exceptions from the comments of targparm.ads, and
also fixes a thinko there.
gcc/ada/
* targparm.ads: Remove references to front-end-based exceptions. Fix
thinko.
Piotr Trojanek [Fri, 12 May 2023 12:06:07 +0000 (14:06 +0200)]
ada: Accept aspect Always_Terminates on packages
The recently added aspect Always_Terminates is now allowed on packages
and generic packages, but only when it has no arguments. The intuitive
meaning is that all subprograms declared in such a package are always
terminating.
gcc/ada/
* contracts.adb (Add_Contract_Item): Add pragma Always_Terminates to
package contract.
* sem_prag.adb (Analyze_Pragma): Accept pragma Always_Terminates on
packages and generic packages, but only when it has no arguments.
Eric Botcazou [Fri, 12 May 2023 18:20:16 +0000 (20:20 +0200)]
ada: Fix missing error on function call returning incomplete view
Testing for the presence of Non_Limited_View is not sufficient to detect
whether the nonlimited view has been analyzed because Build_Limited_Views
always sets the field on the limited view. Instead the discriminant is
whether this nonlimited view is itself an incomplete type.
gcc/ada/
* sem_ch4.adb (Analyze_Call): Adjust the test to detect the presence
of an incomplete view of a type on a function call.
Eric Botcazou [Wed, 10 May 2023 21:48:18 +0000 (23:48 +0200)]
ada: Remove Ttypes.Max_Unaligned_Field
This constant has been unused for ages. The corresponding getter function
is also removed from the Get_Targ package, but the corresponding constant
declared in Set_Targ is preserved for the sake of backward compatibility
of the target file format.
gcc/ada/
* get_targ.ads (Get_Max_Unaligned_Field): Delete.
* ada_get_targ.adb (Get_Max_Unaligned_Field): Likewise.
* get_targ.adb (Get_Max_Unaligned_Field): Likewise.
* set_targ.ads (Max_Unaligned_Field): Adjust comment.
* set_targ.adb: Set Max_Unaligned_Field to 1 during elaboration.
* ttypes.ads (Max_Unaligned_Field): Delete.
Piotr Trojanek [Tue, 9 May 2023 12:14:57 +0000 (14:14 +0200)]
ada: Accept aspect Always_Terminates without expression
The recently added aspect Always_Terminates is now accepted without
explicit boolean expression, where a missing expression implicitly means
True, similar to aspects Async_Readers, Async_Writers, etc.
gcc/ada/
* aspects.adb
(Base_Aspect): Fix layout.
* aspects.ads
(Aspect_Argument): Expression for Always_Terminates is optional.
* sem_prag.adb
(Analyze_Always_Terminates_In_Decl_Part): Only analyze expression when
pragma argument is present.
(Analyze_Pragma): Argument for Always_Terminates is optional; fix
whitespace for Async_Readers.
Eric Botcazou [Mon, 8 May 2023 14:17:33 +0000 (16:17 +0200)]
ada: Fix aspect Linker_Section ignored on subprogram body
The compiler is waiting for the freeze node of the body, but it is never
generated since the freezing of the body is not delayed. The change also
removes an obsolete piece of code.
gcc/ada/
* sem_ch13.adb (Analyze_Aspect_Specifications): Add missing items
in the list of aspects handled by means of Insert_Pragma.
<Aspect_Linker_Section>: Remove obsolete code. Do not delay the
processing of the aspect if the entity is already frozen.
Xi Ruoyao [Wed, 14 Jun 2023 00:24:05 +0000 (08:24 +0800)]
LoongArch: Set default alignment for functions and labels with -mtune
The LA464 micro-architecture is sensitive to alignment of code. The
Loongson team has benchmarked various combinations of function, the
results [1] show that 16-byte label alignment together with 32-byte
function alignment gives best results in terms of SPEC score.
Add a mtune-based table-driven mechanism to set the default of
-falign-{functions,labels}. As LA464 is the first (and the only for
now) uarch supported by GCC, the same setting is also used for
the "generic" -mtune=loongarch64. In the future we may set different
settings for LA{2,3,6}64 once we add the support for them.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk?
gcc/ChangeLog:
* config/loongarch/loongarch-tune.h (loongarch_align): New
struct.
* config/loongarch/loongarch-def.h (loongarch_cpu_align): New
array.
* config/loongarch/loongarch-def.c (loongarch_cpu_align): Define
the array.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the value of
-falign-functions= if -falign-functions is enabled but no value
is given. Likewise for -falign-labels=.
Thomas Schwinge [Wed, 7 Jun 2023 15:12:01 +0000 (17:12 +0200)]
Fix 'dg-warning' in 'c-c++-common/Wfree-nonheap-object-3.c' for C++
[...]/c-c++-common/Wfree-nonheap-object-3.c:57:24: warning: 'malloc (dealloc_float)' attribute ignored with deallocation functions declared 'inline' [-Wattributes]
[...]/c-c++-common/Wfree-nonheap-object-3.c:51:1: note: deallocation function declared here
[...]/c-c++-common/Wfree-nonheap-object-3.c: In function 'void test_nowarn_int(int)':
[...]/c-c++-common/Wfree-nonheap-object-3.c:25:20: warning: 'void __builtin_free(void*)' called on pointer 'p' with nonzero offset 4 [-Wfree-nonheap-object]
[...]/c-c++-common/Wfree-nonheap-object-3.c:24:24: note: returned from 'int* alloc_int(int)'
[...]/c-c++-common/Wfree-nonheap-object-3.c: In function 'void test_nowarn_long(int)':
[...]/c-c++-common/Wfree-nonheap-object-3.c:45:18: warning: 'void dealloc_long(long int*)' called on pointer '<unknown>' with nonzero offset 8 [-Wfree-nonheap-object]
[...]/c-c++-common/Wfree-nonheap-object-3.c:44:26: note: returned from 'long int* alloc_long(int)'
In function 'void dealloc_float(float*)',
inlined from 'void test_nowarn_float(int)' at [...]/c-c++-common/Wfree-nonheap-object-3.c:68:19:
[...]/c-c++-common/Wfree-nonheap-object-3.c:53:18: warning: 'void __builtin_free(void*)' called on pointer '<unknown>' with nonzero offset 8 [-Wfree-nonheap-object]
[...]/c-c++-common/Wfree-nonheap-object-3.c: In function 'void test_nowarn_float(int)':
[...]/c-c++-common/Wfree-nonheap-object-3.c:67:28: note: returned from 'float* alloc_float(int)'
PASS: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for warnings, line 25)
FAIL: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for warnings, line 45)
PASS: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for warnings, line 51)
PASS: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for warnings, line 53)
PASS: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for warnings, line 57)
FAIL: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for excess errors)
Excess errors:
[...]/c-c++-common/Wfree-nonheap-object-3.c:45:18: warning: 'void dealloc_long(long int*)' called on pointer '<unknown>' with nonzero offset 8 [-Wfree-nonheap-object]
..., that is: decorated 'void dealloc_long(long int*)' instead of plain
'dealloc_long' -- similar to how all the other 'dg-warning's allow for the
decorated function signature in addition to the plain one.
This issue was latent since the test case was added in
commit fe7f75cf16783589eedbab597e6d0b8d35d7e470
"Correct/improve maybe_emit_free_warning (PR middle-end/98166, PR c++/57111, PR middle-end/98160)",
and was finally exposed by my recent
commit 9c03391ba447ff86038d6a34c90ae737c3915b5f
"Tighten 'dg-warning' alternatives in 'c-c++-common/Wfree-nonheap-object{,-2,-3}.c'".
gcc/testsuite/
* c-c++-common/Wfree-nonheap-object-3.c: Fix 'dg-warning' for C++.
Jakub Jelinek [Thu, 15 Jun 2023 07:12:40 +0000 (09:12 +0200)]
middle-end, i386: Pattern recognize add/subtract with carry [PR79173]
The following patch introduces {add,sub}c5_optab and pattern recognizes
various forms of add with carry and subtract with carry/borrow, see
pr79173-{1,2,3,4,5,6}.c tests on what is matched.
Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
calls per limb (with just one for the least significant one), for
add with carry even when it is hand written in C (for subtraction
reassoc seems to change it too much so that the pattern recognition
doesn't work). __builtin_{add,sub}_overflow are standardized in C23
under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
Note, clang has for these (IMHO badly designed)
__builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
a single bit of carry, but basically add 3 unsigned values or
subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
because of that. If we wanted to introduce those for clang compatibility,
we could and lower them early to just two __builtin_{add,sub}_overflow
calls and let the pattern matching in this patch recognize it later.
I've added expanders for this on ix86 and in addition to that
added various peephole2s (in preparation patches for this patch) to make
sure we get nice (and small) code for the common cases. I think there are
other PRs which request that e.g. for the _{addcarry,subborrow}_u{32,64}
intrinsics, which the patch also improves.
Would be nice if support for these optabs was added to many other targets,
arm/aarch64 and powerpc* certainly have such instructions, I'd expect
in fact that most targets do.
The _BitInt support I'm working on will also need this to emit reasonable
code.
2023-06-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
* fold-const-call.cc (fold_const_call): Likewise.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named
patterns.
* config/i386/i386.md (uaddc<mode>5, usubc<mode>5): New
define_expand patterns.
(*setcc_qi_addqi3_cconly_overflow_1_<mode>, *setccc): Split
into NOTE_INSN_DELETED note rather than nop instruction.
(*setcc_qi_negqi_ccc_1_<mode>, *setcc_qi_negqi_ccc_2_<mode>):
Likewise.
* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.
Jakub Jelinek [Thu, 15 Jun 2023 07:08:37 +0000 (09:08 +0200)]
i386: Add peephole2 patterns to improve subtract with borrow with memory destination [PR79173]
This patch adds subborrow<mode> alternative so that it can have memory
destination and adds various peephole2s which help to match it.
2023-06-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* config/i386/i386.md (subborrow<mode>): Add alternative with
memory destination and add for it define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.
Jakub Jelinek [Thu, 15 Jun 2023 06:49:27 +0000 (08:49 +0200)]
middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc
Here is an incremental patch to handle constant folding of these
in fold-const-call.cc rather than gimple-fold.cc.
Not really sure if that is the way to go because it is replacing 28
lines of former code with 65 of new code, for the overall benefit that say
int
foo (long long *p)
{
int one = 1;
long long max = __LONG_LONG_MAX__;
return __builtin_add_overflow (one, max, p);
}
can be now fully folded already in ccp1 pass while before it was only
cleaned up in forwprop1 pass right after it.
On Wed, Jun 14, 2023 at 12:25:46PM +0000, Richard Biener wrote:
> I think that's still very much desirable so this followup looks OK.
> Maybe you can re-base it as prerequesite though?
Rebased then (of course with the UADDC/USUBC handling removed from this
first patch, will be added in the second one).
2023-06-15 Jakub Jelinek <jakub@redhat.com>
* gimple-fold.cc (gimple_fold_call): Move handling of arg0
as well as arg1 INTEGER_CSTs for .UBSAN_CHECK_{ADD,SUB,MUL}
and .{ADD,SUB,MUL}_OVERFLOW calls from here...
* fold-const-call.cc (fold_const_call): ... here.
This patch adds new RTL and tests for sabd and uabd
PR tree-optimization/109156
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>):
Rename to <su>abd<mode>3.
* config/aarch64/aarch64-sve.md (<su>abd<mode>_3): Rename
to <su>abd<mode>3.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/abd.h: New file.
* gcc.target/aarch64/abd_2.c: New test.
* gcc.target/aarch64/abd_3.c: New test.
* gcc.target/aarch64/abd_4.c: New test.
* gcc.target/aarch64/abd_none_2.c: New test.
* gcc.target/aarch64/abd_none_3.c: New test.
* gcc.target/aarch64/abd_none_4.c: New test.
* gcc.target/aarch64/abd_run_1.c: New test.
* gcc.target/aarch64/sve/abd_1.c: New test.
* gcc.target/aarch64/sve/abd_none_1.c: New test.
* gcc.target/aarch64/sve/abd_2.c: New test.
* gcc.target/aarch64/sve/abd_none_2.c: New test.
This adds a recognition pattern for the non-widening
absolute difference (ABD).
gcc/ChangeLog:
* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
chenxiaolong [Thu, 15 Jun 2023 02:46:24 +0000 (02:46 +0000)]
LoongArch: Change the default value of LARCH_CALL_RATIO to 6.
During the regression testing of the LoongArch architecture GCC, it was found
that the tests in the pr90883.C file failed. The problem was modulated and
found that the error was caused by setting the macro LARCH_CALL_RATIO to a too
large value. Combined with the actual LoongArch architecture, the different
thresholds for meeting the test conditions were tested using the engineering method
(SPEC CPU 2006), and the results showed that its optimal threshold should be set
to 6.
gcc/ChangeLog:
* config/loongarch/loongarch.h (LARCH_CALL_RATIO): Modify the value
of macro LARCH_CALL_RATIO on LoongArch to make it perform optimally.
For this selector, we can use vmsltu + vmerge to optimize the codegen.
Before this patch:
merge0:
addi a5,sp,16
vl1re8.v v3,0(a5)
li a5,31
vsetivli zero,16,e8,m1,ta,mu
vmv.v.x v2,a5
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
vl1re8.v v1,0(a5)
vl1re8.v v4,0(sp)
vand.vv v1,v1,v2
vmsgeu.vi v0,v1,16
vrgather.vv v2,v4,v1
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t
vs1r.v v2,0(a0)
ret
After this patch:
merge0:
addi a5,sp,16
vl1re8.v v1,0(a5)
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
vsetivli zero,16,e8,m1,ta,ma
vl1re8.v v0,0(a5)
vl1re8.v v2,0(sp)
vmsltu.vi v0,v0,16
vmerge.vvm v1,v1,v2,v0
vs1r.v v1,0(a0)
ret
The key of this optimization is that:
1. mask = vmsltu (selector, nunits)
2. result = vmerge (op0, op1, mask)
gcc/ChangeLog:
* config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern.
(expand_vec_perm_const_1): Add merge optmization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test.
Lehua Ding [Wed, 14 Jun 2023 11:56:11 +0000 (19:56 +0800)]
RISC-V: Ensure vector args and return use function stack to pass [PR110119]
The V2 patch address comments from Juzhe, thanks.
Hi,
The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
Pan Li [Tue, 13 Jun 2023 15:19:14 +0000 (23:19 +0800)]
RISC-V: Bugfix for vec_init repeating auto vectorization in RV32
When constructing a vector mask from individual elements we wrongly
assumed that we can broadcast BITS_PER_WORD (i.e. XLEN). The maximum is
actually the vector element length (i.e. ELEN). This patch fixes this.
After this patch, below failures on RV32 will be fixed.
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution test
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax execution test
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::get_merge_scalar_mask):
Take elen instead of scalar BITS_PER_WORD.
(expand_vector_init_merge_repeating_sequence): Use inner_bits_size
instead of scaler BITS_PER_WORD.
Maxim Kuvyrkov [Fri, 2 Jun 2023 14:51:40 +0000 (14:51 +0000)]
[contrib] validate_failures.py: Ignore stray filesystem paths in results
This patch simplifies comparison of results that have filesystem
paths. E.g., (assuming different values of <N>):
<cut>
Running /home/user/gcc-N/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
ERROR: tcl error sourcing /home/user/gcc-N/gcc/testsuite/gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp.
</cut>
We add "--srcpath <regex>", option, and set it by default to
"[^ ]+/testsuite/", which works well for all components of the GNU
Toolchain. We then remove substrings matching <regex> from paths of
.exp files and from occasional "ERROR:" results.
This option sets "today" date to compare expiration entries against.
Setting expiration date into the future allows re-detection of flaky
tests and creating fresh entries for them before the current flaky
entries expire.
contrib/ChangeLog:
* testsuite-management/validate_failures.py (TestResult): Update.
(Main): Handle new option "--expiry_date YYYYMMDD".
- Print message in case of broken sum file error.
- Print error messages to stderr. The script's stdout is, usually,
redirected to a file, and error messages shouldn't go there.
Christophe Lyon [Thu, 1 Jun 2023 12:00:48 +0000 (12:00 +0000)]
[contrib] validate_failures.py: Support "$tool:" prefix in exp names
This makes it easier to extract the $tool:$exp pair when iterating
over failures/flaky tests, which, in turn, simplifies re-running
testsuite parts that have unexpected failures or passes.
contrib/ChangeLog:
* testsuite-management/validate_failures.py (_EXP_LINE_FORMAT,)
(_EXP_LINE_REX, ResultSet): Support "$tool:" prefix in exp names.
Maxim Kuvyrkov [Thu, 1 Jun 2023 12:05:08 +0000 (12:05 +0000)]
[contrib] validate_failures.py: Be more stringent in parsing result lines
Before this patch we would identify malformed line
"UNRESOLVEDTest run by tcwg-buildslave on Mon Aug 23 10:17:50 2021"
as an interesting result, only to fail in TestResult:__init__ due
to missing ":" after UNRESOLVED.
This patch makes all places that parse result lines use a single
compiled regex.
contrib/ChangeLog:
* testsuite-management/validate_failures.py (_VALID_TEST_RESULTS_REX):
Update.
(TestResult): Use _VALID_TEST_RESULTS_REX.