Viljar Indus [Tue, 5 Nov 2024 08:42:55 +0000 (10:42 +0200)]
ada: Remove Current_Node from Errout
This variable was used for Opt.Include_Subprogram_In_Messages
activated by -gnatdJ. This switch has been removed so this variable
is no longer used.
gcc/ada/ChangeLog:
* errout.ads: Remove Current_Node.
* errout.adb: Remove uses of Current_Node.
* par-ch6.adb: Same as above.
* par-ch7.adb: Same as above.
* par-ch9.adb: Same as above.
Viljar Indus [Mon, 4 Nov 2024 12:16:02 +0000 (14:16 +0200)]
ada: Remove Raise_Exception_On_Error
Raise_Exception_On_Error is never modified so it can be removed.
gcc/ada/ChangeLog:
* err_vars.ads: Remove Raise_Exception_On_Error and
Error_Msg_Exception.
* errout.ads: Same as above.
* errout.adb: Remove uses of Raise_Exception_On_Error and
Error_Msg_Exception.
* errutil.adb: Same as above.
Viljar Indus [Thu, 31 Oct 2024 13:50:46 +0000 (15:50 +0200)]
ada: Store error message kind as an enum
Simplify the storage for the kind of error message under a single
enumerator. This replaces the existing attributes with the following
enumeration values.
* Is_Warning_Msg => Warning
* Is_Style_Msg => Style
* Is_Info_Msg => Info
* Is_Check_Msg => Low_Check, Medium_Check, High_Check
* Is_Serious_Error => Error, if the attribute was false then
Non_Serious_Error.
gcc/ada/ChangeLog:
* diagnostics-converter.adb: Use new enum values instead
of the old attributes.
* diagnostics-switch_repository.adb: Same as above.
* diagnostics-utils.adb: Same as above.
* diagnostics.adb: Same as above.
* diagnostics.ads: Same as above.
* errout.adb: Same as above.
* erroutc.adb: Same as above.
* erroutc.ads: Remove old attriubtes and replace them
with Error_Msg_Kind.
* errutil.adb: Same as others.
Viljar Indus [Fri, 1 Nov 2024 11:15:21 +0000 (13:15 +0200)]
ada: Refactor code for printing the error location
gcc/ada/ChangeLog:
* errout.adb: Use Output_Msg_Location
* erroutc.adb: add common implementation for printing the
error message line.
* erroutc.ads: Add new method Output_Msg_Location
* errutil.adb: use Output_Msg_Location
The old specifications were ambiguous as to whether they expected
actuals to have %s/%b suffixes. The new specifications also increases
modularity across the board.
gcc/ada/ChangeLog:
* uname.ads (Is_Internal_Unit_Name, Is_Predefined_Unit_Name): Change
specifications to take a Unit_Name_Type as input.
(Encoded_Library_Unit_Name): New subprogram.
(Is_Predefined_Unit_Name): New overloaded subprogram.
(Get_External_Unit_Name_String): Make use of new
Encoded_Library_Unit_Name subprogram.
* uname.adb (Is_Internal_Unit_Name, Is_Predefined_Unit_Name): Adapt
bodies to specification changes.
* fname-uf.adb (Get_File_Name): Adapt to Uname interface changes.
Before this patch, the body of Fname.UF.Get_File_Name did a lot of
juggling with the global name buffer, which made it hard to understand.
This patch makes the body use local buffers instead.
gcc/ada/ChangeLog:
* fname-uf.adb (Get_File_Name): Use local name buffers.
Eric Botcazou [Mon, 11 Nov 2024 23:18:00 +0000 (00:18 +0100)]
ada: Fix latent issue exposed by recent change in aggregate expansion
The tag is not assigned when a compile-time known aggregate initializes an
object declared with an address clause/aspect.
gcc/ada/ChangeLog:
* freeze.adb: Remove clauses for Exp_Ch3.
(Check_Address_Clause): Always reassign the tag for an object of a
tagged type if there is an initialization expression.
Paul Thomas [Tue, 26 Nov 2024 08:58:21 +0000 (08:58 +0000)]
Fortran: Partial reversion of r15-5083 [PR117763]
2024-11-26 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/117763
* trans-array.cc (gfc_get_array_span): Guard against derefences
of 'expr'. Clean up some typos. Use 'gfc_get_vptr_from_expr'
for clarity and apply a functional reversion of last section
that deals with class dummies.
gcc/testsuite/
PR fortran/117763
* gfortran.dg/pr117763.f90: New test.
we wrongly "propagate" VL=2 from vslidedown into the load.
Although we check whether the "target" instruction has a merge operand
the check only handles cases where the merge operand itself is
loaded, like (2) in the snippet above. For (1) we load the non-merged
operand, assume propagation is valid and continue despite (2).
This patch just re-uses avl_can_be_propagated_p in order to disable
slides altogether in such situations.
gcc/ChangeLog:
* config/riscv/riscv-avlprop.cc (pass_avlprop::get_vlmax_ta_preferred_avl):
Check whether the use insn is valid for propagation.
Jakub Jelinek [Tue, 26 Nov 2024 08:46:51 +0000 (09:46 +0100)]
builtins: Fix up DFP ICEs on __builtin_fpclassify [PR102674]
This patch is similar to the one I've just posted, __builtin_fpclassify also
needs to print decimal float minimum differently and use real_from_string3.
Plus I've done some formatting fixes.
2024-11-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/102674
* builtins.cc (fold_builtin_fpclassify): Use real_from_string3 rather
than real_from_string. Use "1E%d" format string rather than "0x1p%d"
for decimal float minimum. Formatting fixes.
Jakub Jelinek [Tue, 26 Nov 2024 08:45:21 +0000 (09:45 +0100)]
builtins: Fix up DFP ICEs on __builtin_is{inf,finite,normal} [PR43374]
__builtin_is{inf,finite,normal} builtins ICE on _Decimal{32,64,128,64x}
operands unless those operands are constant.
The problem is that we fold the builtins to comparisons with the largest
finite number, but
a) get_max_float was only handling binary floats
b) real_from_string again assumes binary float
and so we were ICEing in the build_real called after the two calls.
This patch adds decimal handling into get_max_float (well, moves it
from c-cppbuiltin.cc which was printing those for __DEC{32,64,128}_MAX__
macros) and uses real_from_string3 (perhaps it is time to rename it
to just real_from_string now that we can use function overloading)
so that it handles both binary and decimal floats.
2024-11-26 Jakub Jelinek <jakub@redhat.com>
PR middle-end/43374
gcc/
* real.cc (get_max_float): Handle decimal float.
* builtins.cc (fold_builtin_interclass_mathfn): Use
real_from_string3 rather than real_from_string. Use
"1E%d" format string rather than "0x1p%d" for decimal
float minimum.
gcc/c-family/
* c-cppbuiltin.cc (builtin_define_decimal_float_constants): Use
get_max_float.
gcc/testsuite/
* gcc.dg/dfp/pr43374.c: New test.
Andrew Pinski [Tue, 26 Nov 2024 08:37:33 +0000 (00:37 -0800)]
affine: Remove unused variable rem from wide_int_constant_multiple_p
This might fix the current bootstrap failure on aarch64, I only tested it
on x86_64. But the rem variable is unused and the for poly_widest_int, there
could be loop if NUM_POLY_INT_COEFFS is 2 or more. In the case of aarch64,
NUM_POLY_INT_COEFFS is 2.
Note the reason why there is warning for the unused variable is due to the deconstructor.
Pushed as obvious after a build for x86_64-linux-gnu.
gcc/ChangeLog:
* tree-affine.cc (wide_int_constant_multiple_p): Remove unused rem variable.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jonathan Wakely [Sun, 17 Nov 2024 20:46:07 +0000 (20:46 +0000)]
libstdc++: Move std::error_category symbol to separate file [PR117630]
As described in PR 117630 the cow-stdexcept.cc file pulls in symbols
from system_error.cc, which are not actually needed there. Moving the
definition of error_category::_M_message to a separate file should solve
it.
libstdc++-v3/ChangeLog:
PR libstdc++/117630
* src/c++11/Makefile.am: Add new file.
* src/c++11/Makefile.in: Regnerate.
* src/c++11/cow-stdexcept.cc (error_category::_M_message): Move
member function definition to ...
* src/c++11/cow-system_error.cc: New file.
Cui, Lili [Tue, 26 Nov 2024 07:10:23 +0000 (15:10 +0800)]
Optimize 128-bit vector permutation with pand, pandn and por.
This patch introduces a new subroutine in ix86_expand_vec_perm_const_1.
On x86, use mixed constant permutation for V8HImode and V16QImode when
SSE2 is supported. This patch handles certain vector shuffle operations
more efficiently using pand, pandn, and por. This change is intended to
improve assembly code generation for configurations that support SSE2.
gcc/ChangeLog:
PR target/116675
* config/i386/i386-expand.cc (expand_vec_perm_pand_pandn_por):
New subroutine.
(ix86_expand_vec_perm_const_1): Call expand_vec_perm_pand_pandn_por.
gcc/testsuite/ChangeLog:
PR target/116675
* gcc.target/i386/pr116675.c: New test.
Haochen Jiang [Fri, 22 Nov 2024 07:57:47 +0000 (15:57 +0800)]
i386/testsuite: Correct AVX10.2 FP8 test mask usage
Under FP8, we should not use AVX512F_LEN_HALF to get the mask size since
it will get 16 instead of 8 and drop into wrong if condition. Correct
the usage for vcvtneph2[b,h]f8[,s] runtime test.
Joseph Myers [Tue, 26 Nov 2024 03:25:44 +0000 (03:25 +0000)]
c: Fix ICEs from invalid atomic compound assignment [PR98195, PR117755]
As reported in bug 98195, there are ICEs from an _Atomic compound
assignment with an incomplete type on the RHS, arising from an invalid
temporary being created with such a type. As reported in bug 117755,
there are also (different) ICEs in cases with complete types where the
binary operation itself is invalid, when inside a nested function,
arising from a temporary being created for the RHS, but then not used
because the binary operation returns error_mark_node, resulting in the
temporary not appearing in a TARGET_EXPR, never getting its
DECL_CONTEXT set by the gimplifier and eventually resulting in an ICE
in nested function processing (trying to find a function context for
the temporary) as a result.
Fix the first ICE with an earlier check for a complete type for the
RHS of an assignment so the problematic temporary is never created for
an incomplete type (which changes the error message three existing
tests get for that case; the new message seems as good as the old
one). Fix the second ICE by ensuring that once a temporary has been
created, it always gets a corresponding TARGET_EXPR even on error.
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/98195
PR c/117755
gcc/c/
* c-typeck.cc (build_atomic_assign): Always create a TARGET_EXPR
for newval even in case of error from binary operation.
(build_modify_expr): Check early for incomplete type of rhs.
Gaius Mulley [Mon, 25 Nov 2024 22:46:16 +0000 (22:46 +0000)]
PR modula2/117777: m2 does not allow single const string in asm volatile
gm2 does not allow single const string in ASM VOLATILE. The bugfix is to
modify AsmOperands in all passes except P3Build.bnf (which is correct).
The remaining passes need to make the term following the ConstExpression
optional.
gcc/m2/ChangeLog:
PR modula2/117777
* gm2-compiler/P0SyntaxCheck.bnf (AsmOperands): Allow term after
ConstExpression to be optional.
* gm2-compiler/P1Build.bnf (AsmOperands): Ditto.
* gm2-compiler/P2Build.bnf (AsmOperands): Ditto.
* gm2-compiler/PCBuild.bnf (AsmOperands): Ditto.
* gm2-compiler/PHBuild.bnf (AsmOperands): Ditto.
gcc/testsuite/ChangeLog:
PR modula2/117777
* gm2/extensions/asm/pass/conststr.mod: New test.
Andrew Pinski [Mon, 25 Nov 2024 22:03:27 +0000 (14:03 -0800)]
build: Move sstream include above safe-ctype.h {PR117771]
sstream in some versions of libstdc++ include locale which might not have been
included yet. safe-ctype.h defines the toupper, tolower, etc. as macros so the
c++ header files needed to be included before hand as comment in system.h says:
/* Include C++ standard headers before "safe-ctype.h" to avoid GCC
poisoning the ctype macros through safe-ctype.h */
I don't understand how it was working before when memory was included after
safe-ctype.h rather than before. But this makes sstream consistent with the
other C++ headers.
Pushed as obvious after a build for riscv64-elf.
gcc/ChangeLog:
PR target/117771
* system.h: Move the include of sstream above safe-ctype.h.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
H.J. Lu [Sat, 12 Oct 2024 20:53:14 +0000 (04:53 +0800)]
sibcall: Check partial != 0 for BLKmode argument
The outgoing stack slot size may be different from the BLKmode argument
size due to parameter alignment. Check partial != 0 for BLKmode argument
passed on stack.
gcc/
PR middle-end/117098
* calls.cc (store_one_arg): Check partial != 0 for BLKmode argument
passed on stack.
gcc/testsuite/
PR middle-end/117098
* gcc.dg/sibcall-12.c: New test.
hppa: Revise TImode aritmetic patterns to support arith11_operands
2024-11-25 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR target/117645
* config/pa/pa.md (addti3): Revise pattern to support
arith11_operands. Use "R" operand prefix to print least
significant register of TImode register pair.
(addvti3, subti3, subvti3): Likewise.
(negti2, negvti2): Use "R" operand prefix.
[PR117105][LRA]: Use unique value reload pseudo for early clobber operand
LRA did not generate insn satisfying insn constraints on the PR
test. The reason for this is that LRA assigned the same hard reg for
two conflicting reload pseudos. The two insn reload pseudos are
originated from the same pseudo and LRA tried to optimize as it
assigned the same value for the reload pseudos. It is an LRA
optimization to minimize reload insns. The two reload pseudos
conflict as one of them is an early clobber insn operands. The patch
solves this problem by assigning unique value if the operand is early
clobber one.
gcc/ChangeLog:
PR target/117105
* lra-constraints.cc (get_reload_reg): Create unique value reload
pseudos for early clobbered operands.
gcc/testsuite/ChangeLog:
PR target/117105
* gcc.target/i386/pr117105.c: New test.
Sandra Loosemore [Sat, 23 Nov 2024 23:59:13 +0000 (23:59 +0000)]
nios2: Remove all support for Nios II target.
nios2 target support in GCC was deprecated in GCC 14 as the
architecture has been EOL'ed by the vendor. This patch removes the
entire port for GCC 15
There are still references to "nios2" in libffi and libgo. Since those
libraries are imported into the gcc sources from master copies maintained
by other projects, those will need to be addressed elsewhere.
Steve Kargl [Mon, 25 Nov 2024 02:26:03 +0000 (18:26 -0800)]
Fortran: Check IMPURE in BLOCK inside DO CONCURRENT.
PR fortran/117765
gcc/fortran/ChangeLog:
* resolve.cc (check_pure_function): Check the stack to
see if the function is in a nested BLOCK and, if that
block is inside a DO_CONCURRENT, issue an error.
gcc/testsuite/ChangeLog:
* gfortran.dg/impure_fcn_do_concurrent.f90: New test.
Robin Dapp [Thu, 21 Nov 2024 13:49:53 +0000 (14:49 +0100)]
RISC-V: Ensure vtype for full-register moves [PR117544].
As discussed in PR117544 the VTYPE register is not preserved across
function calls. Even though vmv1r-like instructions operate
independently of the actual vtype they still require a valid vtype. As
we cannot guarantee that the vtype is valid we must make sure to emit a
vsetvl between a function call and a vmv1r.v.
This patch makes the necessary changes by splitting the full-reg-move
insns into patterns that use the vtype register and adding vmov to the
types of instructions requiring a vset.
Robin Dapp [Thu, 21 Nov 2024 14:34:37 +0000 (15:34 +0100)]
genemit: Distribute evenly to files [PR111600].
currently we distribute insn patterns in genemit, partitioning them
by the number of patterns per file. The first 100 into file 1, the
next 100 into file 2, and so on. Depending on the patterns this
can lead to files of very uneven sizes.
Similar to the genmatch split, this patch introduces a dynamic
choose_output () which considers the size of the output files
and selects the shortest one for the next pattern.
gcc/ChangeLog:
PR target/111600
* genemit.cc (handle_arg): Use files instead of filenames.
(main): Ditto.
* gensupport.cc (SIZED_BASED_CHUNKS): Define.
(choose_output): New function.
* gensupport.h (choose_output): Declare.
Richard Biener [Mon, 25 Nov 2024 12:32:15 +0000 (13:32 +0100)]
target/116760 - 416.gamess slowdown with SLP
For the TWOTFF loop vectorization the backend scales constructor
and vector extract cost to make higher VFs less profitable. This
heuristic currently fails to consider VMAT_STRIDED_SLP which we
now get with single-lane SLP, causing a huge regression in SPEC 2k6
416.gamess for the respective loop nest.
The following fixes this, matching behavior to that of GCC 14 by
treating single-lane VMAT_STRIDED_SLP the same as VMAT_ELEMENTWISE.
PR target/116760
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Scale vec_construct for single-lane VMAT_STRIDED_SLP the
same as VMAT_ELEMENTWISE.
* tree-vect-stmts.cc (vectorizable_store): Pass SLP node
down to costing for vec_to_scalar for VMAT_STRIDED_SLP.
Richard Biener [Fri, 22 Nov 2024 12:58:08 +0000 (13:58 +0100)]
Add extra 64bit SSE vector epilogue in some cases
Similar to the X86_TUNE_AVX512_TWO_EPILOGUES tuning which enables
an extra 128bit SSE vector epilouge when doing 512bit AVX512
vectorization in the main loop the following allows a 64bit SSE
vector epilogue to be generated when the previous vector epilogue
still had a vectorization factor of 16 or larger (which usually
means we are operating on char data).
This effectively applies to 256bit and 512bit AVX2/AVX512 main loops,
a 128bit SSE main loop would already get a 64bit SSE vector epilogue.
Together with X86_TUNE_AVX512_TWO_EPILOGUES this means three
vector epilogues for 512bit and two vector epilogues when enabling
256bit vectorization. I have not added another tunable for this
RFC - suggestions on how to avoid inflation there welcome.
This speeds up 525.x264_r to within 5% of the -mprefer-vector-size=128
speed with -mprefer-vector-size=256 or -mprefer-vector-size=512
(the latter only when -mtune-crtl=avx512_two_epilogues is in effect).
I have not done any further benchmarking, this merely shows the
possibility and looks for guidance on how to expose this to the
uarch tunings or to the user (at all?) if not gating on any uarch
specific tuning.
Note 64bit SSE isn't a native vector size so we rely on emulation
being "complete" (if not epilogue vectorization will only fail, so
it's "safe" in this regard). With AVX512 ISA available an alternative
is a predicated epilog, but due to possible STLF issues user control
would be required here.
* config/i386/i386.cc (ix86_vector_costs::finish_cost): For an
128bit SSE epilogue request a 64bit SSE epilogue if the 128bit
SSE epilogue VF was 16 or higher.
Richard Biener [Mon, 25 Nov 2024 08:46:28 +0000 (09:46 +0100)]
tree-optimization/117767 - VMAT_STRIDED_SLP and alignment
This plugs another hole in alignment checking with VMAT_STRIDED_SLP.
When using an alternate load or store type we have to check whether
that's supported with respect to required vector alignment.
PR tree-optimization/117767
* tree-vect-stmts.cc (vectorizable_store): Check for supported
alignment before using a an alternate store vector type.
(vectorizable_load): Likewise for loads.
Jakub Jelinek [Mon, 25 Nov 2024 08:36:41 +0000 (09:36 +0100)]
libsanitizer: Remove -pedantic from AM_CXXFLAGS [PR117732]
We aren't the master repository for the sanitizers and clearly upstream
introduces various extensions in the code.
All we care about is whether it builds and works fine with GCC, so
-pedantic flag is of no use to us, only maybe to upstream if they
cared about it (which they clearly don't).
The following patch removes those and fixes some whitespace nits at the same
time.
Richard Biener [Wed, 10 Jul 2024 10:45:02 +0000 (12:45 +0200)]
tree-optimization/115825 - improve unroll estimates for volatile accesses
The loop unrolling code assumes that one third of all volatile accesses
can be possibly optimized away which is of course not true. This leads
to excessive unrolling in some cases. The following tracks the number
of stmts with side-effects as those are not eliminatable later and
only assumes one third of the other stmts can be further optimized.
This causes some fallout in the testsuite where we rely on unrolling
even when calls are involved. I have XFAILed g++.dg/warn/Warray-bounds-20.C
but adjusted the others with a #pragma GCC unroll to mimic previous
behavior and retain what the testcase was testing. I've also filed
PR117671 for the case where the size estimation fails to honor the
stmts we then remove by inserting __builtin_unreachable ().
For gcc.dg/tree-ssa/cunroll-2.c the estimate that the code doesn't
grow is clearly bogus and we have explicit code to reject unrolling
for bodies containing calls so I've adjusted the testcase accordingly.
PR tree-optimization/115825
* tree-ssa-loop-ivcanon.cc (loop_size::not_eliminatable_after_peeling):
New.
(loop_size::last_iteration_not_eliminatable_after_peeling): Likewise.
(tree_estimate_loop_size): Count stmts with side-effects as
not optimistically eliminatable.
(estimated_unrolled_size): Compute the number of stmts that can
be optimistically eliminated by followup transforms.
(try_unroll_loop_completely): Adjust.
* gcc.dg/tree-ssa/cunroll-17.c: New testcase.
* gcc.dg/tree-ssa/cunroll-2.c: Adjust to not expect unrolling.
* gcc.dg/pr94600-1.c: Force unrolling.
* c-c++-common/ubsan/unreachable-3.c: Likewise.
* g++.dg/warn/Warray-bounds-20.C: XFAIL cases we rely on
unrolling loops created by new expressions and not inlined
CTOR invocations.
Kito Cheng [Fri, 15 Nov 2024 04:14:54 +0000 (12:14 +0800)]
asan: Support dynamic shadow offset
AddressSanitizer has supported dynamic shadow offsets since 2016[1], but
GCC hasn't implemented this yet because targets using dynamic shadow
offsets, such as Fuchsia and iOS, are mostly unsupported in GCC.
However, RISC-V 64 switched to dynamic shadow offsets this year[2] because
virtual memory space support varies across different RISC-V cores, such as
Sv39, Sv48, and Sv57. We realized that the best way to handle this
situation is by using a dynamic shadow offset to obtain the offset at
runtime.
We introduce a new target hook, TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P, to
determine if the target is using a dynamic shadow offset, so this change
won't affect the static offset path. Additionally, TARGET_ASAN_SHADOW_OFFSET
continues to work even if TARGET_ASAN_DYNAMIC_SHADOW_OFFSET_P is non-zero,
ensuring that KASAN functions as expected.
This patch set has been verified on the Banana Pi F3, currently one of the
most popular RISC-V development boards. All AddressSanitizer-related tests
passed without introducing new regressions.
It was also verified on AArch64 and x86_64 with no regressions in
AddressSanitizer.
Haochen Jiang [Fri, 22 Nov 2024 06:32:16 +0000 (14:32 +0800)]
i386/testsuite: Do not append AVX10.2 option for check_effective_target
When -avx10.2 meet -march with AVX512 enabled, it will report warning
for vector size conflict. The warning will prevent the test to run on
GCC with arch native build on those platforms when
check_effective_target.
Remove AVX10.2 options since we are using inline asm ad it actually do
not need options. It will eliminate the warning.
Add target-independent store forwarding avoidance pass
This pass detects cases of expensive store forwarding and tries to
avoid them by reordering the stores and using suitable bit insertion
sequences. For example it can transform this:
strb w2, [x1, 1]
ldr x0, [x1] # Expensive store forwarding to larger load.
To:
ldr x0, [x1]
strb w2, [x1]
bfi x0, x2, 0, 8
Assembly like this can appear with bitfields or type punning / unions.
On stress-ng when running the cpu-union microbenchmark the following
speedups have been observed.
The transformation is rejected on cases that cause store_bit_field to
generate subreg expressions on different register classes. Files
avoid-store-forwarding-4.c and avoid-store-forwarding-5.c contain such
cases and have been marked as XFAIL.
Due to biasing of its operands in store_bit_field, there is a special
handling for machines with BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. The
need for this was exosed by an issue exposed on the H8 architecture,
which uses big-endian ordering, but BITS_BIG_ENDIAN is false. In that
case, the START parameter of store_bit_field needs to be calculated
from the end of the destination register.
gcc/ChangeLog:
* Makefile.in (OBJS): Add avoid-store-forwarding.o.
* common.opt (favoid-store-forwarding): New option.
* common.opt.urls: Regenerate.
* doc/invoke.texi: New param store-forwarding-max-distance.
* doc/passes.texi: Document new pass.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document new pass.
* params.opt (store-forwarding-max-distance): New param.
* passes.def: Add pass_rtl_avoid_store_forwarding before
pass_early_remat.
* target.def (avoid_store_forwarding_p): New DEFHOOK.
* target.h (struct store_fwd_info): Declare.
* targhooks.cc (default_avoid_store_forwarding_p): New function.
* targhooks.h (default_avoid_store_forwarding_p): Declare.
* tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
* avoid-store-forwarding.cc: New file.
* avoid-store-forwarding.h: New file.
* timevar.def (TV_AVOID_STORE_FORWARDING): New timevar.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
* gcc.target/aarch64/avoid-store-forwarding-5.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-1.c: New test.
* gcc.target/x86_64/abi/callabi/avoid-store-forwarding-2.c: New test.
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
Martin Jambor [Sun, 24 Nov 2024 22:03:43 +0000 (23:03 +0100)]
ipa: Move individual jump function copying to a separate function
When reviewing various IPA bits and pieces I have falsely assumed
that jump function duplication misses copying important bits because
it relies on vec_safe_copy-ing all data in the vector of jump
functions and then just fixes up the few fields it needs to.
Perhaps more importantly, we do want a function to copy one individual
jump function to form jump functions for planned call-graph edges that
model transfer of control to OpenMP outlined regions through calls to
gomp functions.
Therefore, this patch introduces such function and makes
ipa_edge_args_sum_t::duplicate just allocate the new vectors and then
uses the new function to copy the data.
gcc/ChangeLog:
2024-11-01 Martin Jambor <mjambor@suse.cz>
* ipa-prop.cc (ipa_duplicate_jump_function): New function.
(ipa_edge_args_sum_t::duplicate): Move individual jump function
copying to ipa_duplicate_jump_function.
Uros Bizjak [Sun, 24 Nov 2024 21:18:31 +0000 (22:18 +0100)]
testsuite/x86: Add -mfpmath=sse to add_options_for_float16
Add -mfpmath=sse to add_options_for_float16 to avoid error:
'-fexcess-precision=16' is not compatible with '-mfpmath=387'
when compiling gcc.dg/tree-ssa/pow_fold_1.c.
Uros Bizjak [Sun, 24 Nov 2024 21:00:18 +0000 (22:00 +0100)]
i386: x86 can use x >> -y for x >> 32-y [PR36503]
x86 targets mask 32-bit shifts with a 5-bit mask (and 64-bit with 6-bit mask),
so they can use x >> -y instead of x >> 32-y. This form is very common in
bitstream readers, where it's used to read the top N bits from a word.
Andrew Pinski [Sat, 23 Nov 2024 21:42:47 +0000 (13:42 -0800)]
gimplefe: Fix handling of ')'/'}' after a parse error [PR117741]
The problem here is c_parser_skip_until_found stops at a closing nesting
delimiter without consuming it. So if we don't consume it in
c_parser_gimple_compound_statement, we would go into an infinite loop. The C
parser similar code in c_parser_statement_after_labels to handle this specific
case too.
PR c/117741
gcc/c/ChangeLog:
* gimple-parser.cc (c_parser_gimple_compound_statement): Handle
CPP_CLOSE_PAREN/CPP_CLOSE_SQUARE with an error and skipping the token.
gcc/testsuite/ChangeLog:
* gcc.dg/gimplefe-54.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Eric Botcazou [Sun, 24 Nov 2024 19:23:34 +0000 (20:23 +0100)]
Fix vectorization regressions on the SPARC
This fixes the vectorization regressions present on the SPARC by switching
from vcond[u] patterns to vec_cmp[u] + vcond_mask_ patterns. While I was
at it, I merged the patterns for V4HI/V2SI and V8QI enabled with VIS 3/VIS 4
to follow the model of those enabled with VIS 4B, and standardized all the
mnemonics to the version documented in the Oracle SPARC architecture 2015.
Eric Botcazou [Sun, 24 Nov 2024 14:15:54 +0000 (15:15 +0100)]
Adjust error message for initialized variable in .bss
The current message does not make sense with -fno-zero-initialized-in-bss.
gcc/
* doc/invoke.texi (-fno-zero-initialized-in-bss): Adjust for Ada.
* varasm.cc (get_variable_section): Adjust the error message for an
initialized variable in .bss to -fno-zero-initialized-in-bss.
gcc/testsuite/
* gnat.dg/specs/bss1.ads: New test.
PR fortran/117730
* class.cc (add_proc_comp): Only reject a non_overridable if it
has no overridden procedure and the component is already
present in the vtype.
PR fortran/84674
* resolve.cc (resolve_fl_derived): Do not build a vtable for a
derived type extension that is completely empty.
gcc/testsuite/ChangeLog
PR fortran/117730
* gfortran.dg/pr117730_a.f90: New test.
* gfortran.dg/pr117730_b.f90: New test.
PR fortran/84674
* gfortran.dg/pr84674.f90: New test.
Pan Li [Thu, 21 Nov 2024 06:30:49 +0000 (14:30 +0800)]
RISC-V: Refine the vector stride load/store testcases
The rtl expand dump for IFN check of stride load/store testcase is
different between O2 and O3. It it reasonable to leverage target
no-opts/any-opts to filte out, instead of the xfail.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Pan Li [Thu, 21 Nov 2024 06:30:45 +0000 (14:30 +0800)]
RISC-V: Rearrange the test files for vector SAT_TRUNC [NFC]
The test files of vector SAT_TRUNC only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}. For example,
test form 3 for uint32_t SAT_TRUNC will have -3-u32.c for asm check
and -run-3-u32.c for the run test.
Meanwhile, moved all related test files to riscv/rvv/autovec/sat/.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Pan Li [Thu, 21 Nov 2024 06:30:44 +0000 (14:30 +0800)]
RISC-V: Refactor the testcases for vector SAT_SUB
This patch would like to refactor the testcases of vector SAT_SUB
after move to rvv/autovec/sat folder. Includes:
* Refine the include header files.
* Remove unnecessary optimization options.
* Adjust dg-final by any-opts and/or no-opts if the rtl dump changes
on different optimization options (like O2, O3).
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Pan Li [Thu, 21 Nov 2024 06:30:43 +0000 (14:30 +0800)]
RISC-V: Rearrange the test files for vector SAT_SUB [NFC]
The test files of vector SAT_SUB only has numbers as the suffix.
Rearrange the file name to -{form number}-{target-type}. For example,
test form 3 for uint32_t SAT_SUB will have -3-u32.c for asm check and
-run-3-u32.c for the run test.
Meanwhile, moved all related test files to riscv/rvv/autovec/sat/.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
Lewis Hyatt [Mon, 14 Oct 2024 21:59:46 +0000 (17:59 -0400)]
libcpp: Fix ICE lexing invalid raw string in a deferred pragma [PR117118]
The PR shows that we ICE after lexing an invalid unterminated raw string,
because lex_raw_string() pops the main buffer unexpectedly. Resolve by
handling this case the same way as for other directives.
libcpp/ChangeLog:
PR preprocessor/117118
* lex.cc (lex_raw_string): Treat an unterminated raw string the same
way for a deferred pragma as is done for other directives.
gcc/testsuite/ChangeLog:
PR preprocessor/117118
* c-c++-common/raw-string-directive-3.c: New test.
* c-c++-common/raw-string-directive-4.c: New test.
Lewis Hyatt [Tue, 29 Oct 2024 20:57:12 +0000 (16:57 -0400)]
gimple: Handle tail padding when computing gimple_ops_offset
The array gimple_ops_offset_[], which is used to find the trailing op[]
array for a given gimple struct, is computed assuming that op[] will be
found at sizeof(tree) bytes away from the end of the struct. This is only
correct if the alignment requirement of a pointer is the same as the
alignment requirement of the struct, otherwise there will be padding bytes
that invalidate the calculation. On 64-bit platforms, this generally works
fine because a pointer has 8-byte alignment and none of the structs make use
of more than that. On 32-bit platforms, it also currently works fine because
there are no 64-bit integers in the gimple structs. There are 32-bit
platforms (e.g. sparc) on which a pointer has 4-byte alignment and a
uint64_t has 8-byte alignment. On such platforms, adding a uint64_t to the
gimple structs (as will take place when location_t is changed to be 64-bit)
causes gimple_ops_offset_ to be 4 bytes too large.
It would be nice to use offsetof() to compute the offset exactly, but
offsetof() is not guaranteed to work for these types, because they use
inheritance and so are not standard layout types. This patch attempts to
detect the presence of tail padding by detecting when such padding is reused
by inheritance; the padding should generally be reused for the same reason
that offsetof() is not available, namely that all the relevant types use
inheritance. One could envision systems on which this fix does not go far
enough (e.g., if the ABI forbids reuse of tail padding), but it makes things
better without affecting anything that currently works.
gcc/ChangeLog:
* gimple.cc (get_tail_padding_adjustment): New function.
(DEFGSSTRUCT): Adjust the computation of gimple_ops_offset_ to be
correct in the presence of tail padding.
Lewis Hyatt [Fri, 25 Oct 2024 14:18:12 +0000 (10:18 -0400)]
Support for 64-bit location_t: C++ modules parts
The modules implementation is necessarily sensitive to the internal workings
of class line_map, and so it needed changes in order to handle a 64-bit
location_t. The changes mostly boil down to supporting that in the debug
dumping routines (which is accomplished by using a new custom code %K for
that purpose), and supporting that when streaming in and out from the
module (which is accomplished by using a new loc() function to go along with
existing abstractions like u() or z() for streaming in and out different
data types).
gcc/cp/ChangeLog:
* module.cc (bytes_out::loc): New function.
(bytes_in::loc): New function.
(struct span): Change int fields to location_diff_t.
(range_t): Change from "unsigned int" to "line_map_uint_t".
(struct ord_loc_info): Likewise.
(struct macro_loc_info): Likewise.
(class module_state): Likewise.
(dumper::operator()): Add new code 'K' for dumping a location_t.
(loc_spans::init): Use %K instead of %u for location_t dumps.
(loc_spans::open): Likewise.
(loc_spans::close): Likewise. Adjust bitwise expressions to support
64-bit location_t as well.
(struct module_state_config): Change ordinary_locs and macro_locs
from "unsigned int" to "line_map_uint_t". Reorder fields to improve
packing. Rather than changing the constructor initializer list to
match the new order, switch to NSDMI instead.
(module_state::note_location): Adjust to support 64-bit location_t.
(module_state::write_location): Use %K instead of %u for location_t
dumps. Use loc() instead of u() for streaming location_t.
(module_state::read_location): Likewise.
(module_state::write_ordinary_maps): Likewise.
(module_state::write_macro_maps): Likewise.
(module_state::write_config): Likewise.
(module_state::read_config): Likewise.
(module_state::write_prepare_maps): Use %K instead of %u for
location_t dumps. Adjust variable types and bitwise expressions to
support 64-bit location_t.
(module_state::read_ordinary_maps): Likewise.
(module_state::read_macro_maps): Likewise.
(preprocess_module): Adjust data types to support 64-bit number of
line maps.
Lewis Hyatt [Mon, 28 Oct 2024 16:55:24 +0000 (12:55 -0400)]
Support for 64-bit location_t: Analyzer parts
The analyzer occasionally prints internal location_t values for debugging;
adjust those parts so they will work if location_t is 64-bit. For
simplicity, to avoid hassling with the printf format string, just convert to
(unsigned long long) in either case.
gcc/analyzer/ChangeLog:
* checker-event.cc (checker_event::dump): Support printing either
32- or 64-bit location_t values.
* checker-path.cc (checker_path::inject_any_inlined_call_events):
Likewise.
Lewis Hyatt [Mon, 28 Oct 2024 16:52:23 +0000 (12:52 -0400)]
Support for 64-bit location_t: Frontend parts
The C/C++ frontend code contains a couple instances where a callback
receiving a "location_t" argument is prototyped to take "unsigned int"
instead. This will make a difference once location_t can be configured to a
different type, so adjust that now.
Also remove a comment about -flarge-source-files, which will be removed
shortly.
gcc/c-family/ChangeLog:
* c-indentation.cc (should_warn_for_misleading_indentation): Remove
comment about -flarge-source-files.
* c-lex.cc (cb_ident): Change "unsigned int" argument to type
"location_t".
(cb_def_pragma): Likewise.
(cb_define): Likewise.
(cb_undef): Likewise.
Lewis Hyatt [Mon, 28 Oct 2024 21:57:41 +0000 (17:57 -0400)]
libcpp: Fix potential unaligned access in cpp_buffer
libcpp makes use of the cpp_buffer pfile->a_buff to store things while it is
handling macros. It uses it to store pointers (cpp_hashnode*, for macro
arguments) and cpp_macro objects. This works fine because a cpp_hashnode*
and a cpp_macro have the same alignment requirement on either 32-bit or
64-bit systems (namely, the same alignment as a pointer.)
When 64-bit location_t is enabled on a 32-bit sytem, the alignment
requirement may cease to be the same, because the alignment requirement of a
cpp_macro object changes to that of a uint64_t, which be larger than that of
a pointer. It's not the case for x86 32-bit, but for example, on sparc, a
pointer has 4-byte alignment while a uint64_t has 8. In that case,
intermixing the two within the same cpp_buffer leads to a misaligned
access. The code path that triggers this is the one in _cpp_commit_buff in
which a hash table with its own allocator (i.e. ggc) is not being used, so
it doesn't happen within the compiler itself, but it happens in the other
libcpp clients, such as genmatch.
Fix that up by ensuring _cpp_commit_buff commits a fully aligned chunk of the
buffer, so it's ready for anything it may be used for next.
Also modify CPP_ALIGN so that it guarantees to return an alignment at least
the size of location_t. Currently it returns the max of a pointer and a
double. I am not aware of any platform where a double may have smaller
alignment than a uint64_t, but it does not hurt to add location_t here to be
sure.
libcpp/ChangeLog:
* lex.cc (_cpp_commit_buff): Make sure that the buffer is properly
aligned for the next allocation.
* internal.h (struct dummy): Make sure alignment is large enough for
a location_t, just in case.
Lewis Hyatt [Mon, 28 Oct 2024 17:19:40 +0000 (13:19 -0400)]
Support for 64-bit location_t: libcpp preliminaries
Prepare libcpp to support 64-bit location_t, without yet making
any functional changes, by adding new typedefs that enable code to be
written such that it works with any size location_t. Update the usage of
line maps within libcpp accordingly.
Subsequent patches will prepare the rest of the codebase similarly, and then
afterwards, location_t will be changed to uint64_t.
libcpp/ChangeLog:
* include/line-map.h (line_map_uint_t): New typedef, the same type
as location_t.
(location_diff_t): New typedef.
(line_map_suggested_range_bits): New constant.
(struct maps_info_ordinary): Change member types from "unsigned int"
to "line_map_uint_t".
(struct maps_info_macro): Likewise.
(struct location_adhoc_data_map): Likewise.
(LINEMAPS_ALLOCATED): Change return type from "unsigned int" to
"line_map_uint_t".
(LINEMAPS_ORDINARY_ALLOCATED): Likewise.
(LINEMAPS_MACRO_ALLOCATED): Likewise.
(LINEMAPS_USED): Likewise.
(LINEMAPS_ORDINARY_USED): Likewise.
(LINEMAPS_MACRO_USED): Likewise.
(linemap_lookup_macro_index): Likewise.
(LINEMAPS_MAP_AT): Change argument type from "unsigned int" to
"line_map_uint_t".
(LINEMAPS_ORDINARY_MAP_AT): Likewise.
(LINEMAPS_MACRO_MAP_AT): Likewise.
(line_map_new_raw): Likewise.
(linemap_module_restore): Likewise.
(linemap_dump): Likewise.
(line_table_dump): Likewise.
(LINEMAPS_LAST_MAP): Add a linemap_assert() for safety.
(SOURCE_COLUMN): Use a cast to ensure correctness if location_t
becomes a 64-bit type.
* line-map.cc (location_adhoc_data_hash): Don't truncate to 32-bit
prematurely when hashing.
(line_maps::get_or_create_combined_loc): Adapt types to support
potentially 64-bit location_t. Use MAX_LOCATION_T rather than a
hard-coded constant.
(line_maps::get_range_from_loc): Adapt types and constants to
support potentially 64-bit location_t.
(line_maps::pure_location_p): Likewise.
(line_maps::get_pure_location): Likewise.
(line_map_new_raw): Likewise.
(LAST_SOURCE_LINE_LOCATION): Likewise.
(linemap_add): Likewise.
(linemap_module_restore): Likewise.
(linemap_line_start): Likewise.
(linemap_position_for_column): Likewise.
(linemap_position_for_line_and_column): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_ordinary_map_lookup): Likewise.
(linemap_lookup_macro_index): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
(line_table_dump): Likewise.
(linemap_compare_locations): Avoid signed int overflow in the result.
* macro.cc (num_expanded_macros_counter): Change type of global
variable from "unsigned int" to "line_map_uint_t".
(num_macro_tokens_counter): Likewise.
Jerry DeLisle [Sat, 23 Nov 2024 03:29:42 +0000 (19:29 -0800)]
Fortran: Reject missing comma in format.
Standards require rejecting formats where descriptors
are not separated by commas. This change allows this
the missing comma to be accepted only with
-std=legacy.
PR fortran/88052
libgfortran/ChangeLog:
* io/format.c (parse_format_list): Reject missing comma in
format strings by default or if -std=f95 or higher. This is
a runtime error.
Expand coverage for `__builtin_memcpy', primarily for "cpymemM" block
copy pattern, although with smaller sizes open-coded sequences may be
produced instead.
This verifies block sizes in bytes from 1 to 64, across byte alignments
of 1, 2, 4, 8 and byte misalignments within from 0 up to 7 (there's some
redundancy there for the sake of simplicity of the test cases) both for
the source and the destination, making sure all data is copied and no
data is changed outside the area meant to be written.
These choice of the ranges for the parameters has come from the Alpha
backend, whose "cpymemM" pattern covers copies being made of up to 64
bytes and has various corner cases related to base alignment and the
misalignment within.
The test cases have turned invaluable in verifying changes to the Alpha
backend, but functionality covered is generic, so I have concluded these
tests qualify for generic verification and do not have to be limited to
the Alpha-specific subset of the testsuite.
On the implementation side the tests turned out being quite stressful to
GCC and the original simpler version that just expanded all code inline
took a lot of time to complete compilation. Depending on the target and
compilation options elapsed times up to 40 minutes (!) have been seen,
especially with GCC built at `-O0' for debugging purposes.
At the cost of increased complexity where a pair of macros is required
per variant rather than just one I have split the code into individual
functions forced not to be inlined and it improved compilation times
considerably without losing coverage.
Example compilation times with reasonably fast POWER9@2.166GHz at `-O2'
optimization and GCC built at `-O2' for various targets:
I have therefore set the timeout factor accordingly so as to take slower
test hosts into account.
gcc/testsuite/
* gcc.c-torture/execute/memcpy-a1.c: New file.
* gcc.c-torture/execute/memcpy-a2.c: New file.
* gcc.c-torture/execute/memcpy-a4.c: New file.
* gcc.c-torture/execute/memcpy-a8.c: New file.
* gcc.c-torture/execute/memcpy-ax.h: New file.
build: Discard obsolete references to $(GCC_PARTS)
The $(GCC_PARTS) variable was deleted with the Makefile rework in commit fa9585134f6f ("libgcc move to the top level")[1] back in 2007, and yet
the Ada and Modula 2 frontends added references to this variable later
on, with commit e972fd5281b7 ("[Ada] clean ups in Makefiles")[2] back in
2011 and commit 1eee94d35177 ("Merge modula-2 front end onto gcc.") back
in 2022 respectively.
I guess it's because the frontends lived too long externally. Discard
the references then, they serve no purpose nowadays.
Georg-Johann Lay [Sat, 23 Nov 2024 11:51:32 +0000 (12:51 +0100)]
AVR: target/117744 - Fix asm for partial clobber of address reg,
gcc/
PR target/117744
* config/avr/avr.cc (out_movqi_r_mr): Fix code when a load
only partially clobbers an address register due to
changing the address register temporally to accomodate for
faked addressing modes.
Andrew Pinski [Thu, 31 Oct 2024 23:00:18 +0000 (16:00 -0700)]
md-files: Add a note about escaped quotes in braced strings in md files
While looking into PR 33532, It was noted that \" would be treated
still as " for braced strings in the md file. I think that is still
the correct thing to do. So let's just a note to the documentation
on this behavior and NOT change read-md.cc (read_braced_string).
Since this behavior has been there for the last 23 years and only
one person ran into this behavior and helped with the conversion
from using quoted strings to braced strings; that is you just need
to remove the quote around the brace rather than change all of the
code.
Build the documentation to make sure it looks correct.
gcc/ChangeLog:
* doc/rtl.texi: Add a note about quotes in braced strings.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
There's not much notable here, just gprofng (which is in binutils) being
disabled for musl and a new target which got added on that side too.
The only part which may look interesting is the baseargs->bbaseargs
change which goes back to Arsen's gettext work and a fixup which
landed for that on the binutils side in 9c0aa4c53104b1c4333d55aeaf11b41053307929.
* configure: Regenerate.
* configure.ac: Sync with Binutils.
Andrew Pinski [Fri, 22 Nov 2024 17:31:44 +0000 (09:31 -0800)]
build: Remove INCLUDE_MEMORY [PR117737]
Since diagnostic.h is included in over half of the sources, requiring to `#define INCLUDE_MEMORY`
does not make sense. Instead lets unconditionally include memory in system.h.
The majority of this patch is just removing `#define INCLUDE_MEMORY` from the sources which currently
have it.
This should also fix the mingw build issue but I have not tried it.
This is a small improvement to the constant synthesis code to capture a case
appended to PR 109279.
The case in question has the property that the high 32 bits have the value one
less than the low 32 bits and the highest bit in two low 32 bits is on. The
example used in BZ is 0xcccccccccccccccd which comes up computing N/10.
When we construct a constant with bit 31 on, it gets implicitly sign extended.
So something like 0xcccccccd when constructed would generate
0xffffffffcccccccd. The low bits are precisely what we want and the high bits
are a "-1". Both properties are useful.
We left shift that value by 32 positions into a temporary and add that
temporary to the original value. Concretely:
Tested in my tester on rv32 and rv64, waiting on the pre-commit tester to do its thing.
PR target/109279
gcc/
* config/riscv/riscv.cc (riscv_build_integer): Handle another 64-bit
synthesis where high half is one less than the low half and the 32-bit
sign bit is on.
Georg-Johann Lay [Thu, 21 Nov 2024 21:59:14 +0000 (22:59 +0100)]
AVR: target/117726 - Tweak ashiftrt:SI and lshiftrt:SI insns.
This patch is similar to r15-5569 (tweak ashift:SI) but for
ashiftrt and lshiftrt codes. It splits constant shift offsets > 16
into a 3-operand byte shift and a 2-operand residual bit shift.
Moreover, some of the constraint alternatives have been promoted
to 3-operand alternatives regardless of options. For example,
ashift:HI and lshiftrt:HI can support 3 operands for offsets 9...12
without any overhead.
Apart from that, it's a bit of code clean up for 2-byte and 4-byte
shift insns: Use one RTL peephole with any_shift code iterator
instead of 3 individual peepholes. It also removes some useless
split insns; presumably introduced during the cc0 -> CCmode work.
PR target/117726
gcc/
* config/avr/avr-passes.cc (avr_split_shift): Also handle
ASHIFTRT and LSHIFTRT codes for 4-byte shifts.
(constr_split_shift4): New code_attr.
(avr_emit_shift): Adjust to new shift capabilities.
* config/avr/predicates.md (scratch_or_d_register_operand):
rename to scratch_or_dreg_operand.
* config/avr/avr.md: Same.
(define_peephole2): Write the RTL scratch peephole for 2-byte and
4-byte shifts that generates *sh*<mode>3_const insns using code
iterator any_shift.
(*ashlhi3_const_split, *ashrhi3_const_split, *ashrhi3_const_split)
(*lshrsi3_const_split, *lshrhi3_const_split): Remove useless
split insns.
(define_split) [avropt_split_bit_shift]: Add splitters
for 4-byte ASHIFTRT and LSHIFTRT insns using avr_split_shift().
(ashrsi3, *ashrsi3, *ashrsi3_const): Add "r,0,C4a" and "r,r,C4a"
constraint alternatives depending on 2op, 3op.
(lshrsi3, *lshrsi3, *lshrsi3_const): Add "r,0,C4r" and "r,r,C4r"
constraint alternatives depending on 2op, 3op. Add "r,r,C15".
(lshrhi3, *lshrhi3, *lshrhi3_const, ashlhi3, *ashlhi3)
(*ashlhi3_const): Add "r,r,C7c" alternative.
(ashrpsi, *ashrpsi3): Add "r,r,C22" alternative.
(ashlqi, *ashlqi): Turn C06 alternative into "r,r,C06".
* config/avr/constraints.md (C14, C22, C30, C7c): New constraints.
* config/avr/avr.cc (ashlhi3_out, lshrhi3_out)
[case 7, 9, 10, 11, 12]: Support as 3-operand insn.
(lshrsi3_out) [case 15]: Same.
(ashrsi3_out) [case 30]: Same.
(ashrhi3_out) [case 14]: Same.
(ashrqi3_out) [case 6]: Same.
(avr_out_ashrpsi3) [case 22]: Same.
* config/avr/avr.h: Fix comment typo.
* doc/invoke.texi (AVR Options) <-msplit-bit-shift>: Document.
Joseph Myers [Fri, 22 Nov 2024 20:33:10 +0000 (20:33 +0000)]
c: Fix typeof_unqual handling of qualified array types [PR112841]
As reported in bug 112841, typeof_unqual fails to remove qualifiers
from qualified array types. In C23 (unlike in previous standard
versions), array types are considered to have the qualifiers of the
element type, so typeof_unqual should remove such qualifiers (and an
example in the standard shows that is as intended). Fix this by
calling strip_array_types when checking for the presence of
qualifiers. (The reason we check for qualifiers rather than just
using TYPE_MAIN_VARIANT unconditionally is to avoid, as a quality of
implementation matter, unnecessarily losing typedef information in the
case where the type is already unqualified.)
Bootstrapped with no regressions for x86_64-pc-linux-gnu.
PR c/112841
gcc/c/
* c-parser.cc (c_parser_typeof_specifier): Call strip_array_types
when checking for type qualifiers for typeof_unqual.
tree-optimization/117355: object size for PHI nodes with negative offsets
When the object size estimate is returned for a PHI node, it is the
maximum possible value, which is fine in isolation. When combined with
negative offsets however, it may sometimes end up in zero size because
the resultant size was larger than the wholesize, leading
size_for_offset to conclude that there's a potential underflow. Fix
this by allowing a non-strict mode to size_for_offset, which
conservatively returns the size (or wholesize) in case of a negative
offset.
gcc/ChangeLog:
PR tree-optimization/117355
* tree-object-size.cc (size_for_offset): New argument STRICT,
return SZ if it is set to false.
(plus_stmt_object_size): Adjust call to SIZE_FOR_OFFSET.
gcc/testsuite/ChangeLog:
PR tree-optimization/117355
* g++.dg/ext/builtin-object-size2.C (test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c (test10): Adjust expected size.
Georg-Johann Lay [Thu, 21 Nov 2024 16:41:17 +0000 (17:41 +0100)]
AVR: Use Var(avropt_xxx) for option variables in avr.opt.
This is a no-op refactoring that uses a prefix of avropt_
(formerly: avr_) for variables defined qua Var() directives
in avr.opt. This makes it easier to spot values that come directly
from avr.opt in the rest of the backend.
The following patch adds a new option for optimizations related to
replaceable global operators new/delete.
The option isn't called -fassume-sane-operator-new (which clang++
implements), because
1) clang++ option means something different; initially it was an
option to add malloc attribute to those declarations (but we have
malloc attribute on all <new> calls already unconditionally);
later it was changed to add noalias attribute rather than malloc,
whatever it means, but it is certainly about the return value
from the operator new (whether it can alias with other pointers);
we already assume malloc-ish behavior that it doesn't alias any
other pointers
2) the option only affects operator new, we want it affect also
operator delete
The option basically allows to choose between pre-PR101480 behavior
(now the default, more optimistic) and post-PR101480 behavior (safer
but penalizing most of the code in the wild for rare needs).
I've tried to explain stuff in the documentation too.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR c++/110137
PR middle-end/101480
gcc/
* doc/invoke.texi (-fassume-sane-operators-new-delete,
-fno-assume-sane-operators-new-delete): Document.
* gimple.cc (gimple_call_fnspec): Handle
-f{,no-}assume-sane-operators-new-delete.
* ipa-inline-transform.cc (inline_call): Also clear
flag_assume_sane_operators_new_delete on caller when inlining
-fno-assume-sane-operators-new-delete callee into
-fassume-sane-operators-new-delete caller.
gcc/c-family/
* c.opt (fassume-sane-operators-new-delete): New option.
gcc/testsuite/
* g++.dg/tree-ssa/pr110137-1.C: New test.
* g++.dg/tree-ssa/pr110137-2.C: New test.
* g++.dg/tree-ssa/pr110137-3.C: New test.
* g++.dg/tree-ssa/pr110137-4.C: New test.
* g++.dg/torture/pr10148.C: Add -fno-assume-sane-operators-new-delete
as dg-additional-options.
* g++.dg/warn/Warray-bounds-16.C: Revert 2021-11-10 changes.
Jakub Jelinek [Fri, 22 Nov 2024 18:50:22 +0000 (19:50 +0100)]
match.pd: Fix up the new simpliofiers using with_possible_nonzero_bits2 [PR117420]
The following testcase shows wrong-code caused by incorrect use
of with_possible_nonzero_bits2.
That matcher is defined as
/* Slightly extended version, do not make it recursive to keep it cheap. */
(match (with_possible_nonzero_bits2 @0)
with_possible_nonzero_bits@0)
(match (with_possible_nonzero_bits2 @0)
(bit_and:c with_possible_nonzero_bits@0 @2))
and because with_possible_nonzero_bits includes the SSA_NAME case with
integral/pointer argument, both forms can actually match when a SSA_NAME
with integral/pointer type has a def stmt which is BIT_AND_EXPR
assignment with say SSA_NAME with integral/pointer type as one of its
operands (or INTEGER_CST, another with_possible_nonzero_bits case).
And in match.pd the latter actually wins if both match and so when using
(with_possible_nonzero_bits2 @0) the @0 will actually be one of the
BIT_AND_EXPR operands if that form is matched.
Now, with_possible_nonzero_bits2 and with_certain_nonzero_bits2 were added
for the
/* X == C (or X & Z == Y | C) is impossible if ~nonzero(X) & C != 0. */
(for cmp (eq ne)
(simplify
(cmp:c (with_possible_nonzero_bits2 @0) (with_certain_nonzero_bits2 @1))
(if (wi::bit_and_not (wi::to_wide (@1), get_nonzero_bits (@0)) != 0)
{ constant_boolean_node (cmp == NE_EXPR, type); })))
simplifier, but even for that one I think they do not do a good job, they
might actually pessimize stuff rather than optimize, but at least does not
result in wrong-code, because the operands are solely tested with
wi::to_wide or get_nonzero_bits, but not actually used in the
simplification. The reason why it can pessimize stuff is say if we have
# RANGE [irange] int ... MASK 0xb VALUE 0x0
x_1 = ...;
# RANGE [irange] int ... MASK 0x8 VALUE 0x0
_2 = x_1 & 0xc;
_3 = _2 == 2;
then if it used just with_possible_nonzero_bits@0, @0 would have
get_nonzero_bits (@0) 0x8 and (2 & ~8) != 0, so we can fold it into
_3 = 0;
But as it uses (with_possible_nonzero_bits2 @0), @0 is x_1 rather
than _2 and get_nonzero_bits (@0) is unnecessarily conservative,
0xb rather than 0x8 and (2 & ~0xb) == 0, so we don't optimize.
Now, with_possible_nonzero_bits2 can actually improve stuff as well in that
pattern, if say value ranges aren't fully computed yet or the BIT_AND_EXPR
assignment has been added later and the lhs doesn't have range computed yet,
get_nonzero_range on the BIT_AND_EXPR lhs will be all bits set, while
on the BIT_AND_EXPR operand might actually succeed.
I believe better would be to either modify get_nonzero_bits so that it
special cases the SSA_NAME with BIT_AND_EXPR def_stmt (but one level
deep only like with_possible_nonzero_bits2, no recursion), in that case
return bitwise and of get_nonzero_bits (non-recursive) for the lhs and
both operands, and possibly BIT_AND_EXPR itself e.g. for GENERIC
matching during by returning bitwise and of both operands.
Then with_possible_nonzero_bits2 could be needed for the GENERIC case,
perhaps have the second match #if GENERIC, but changed so that the @N
operand always is the whole thing rather than its operand which is
error-prone. Or add get_nonzero_bits wrapper with a different name
which would do that.
with_certain_nonzero_bits2 could be changed similarly, these days
we can test known non-zero bits rather than possible non-zero bits on
SSA_NAMEs too, we record both mask and value, so possible nonzero bits
(aka. get_nonzero_bits) is mask () | value (), while known nonzero bits
is value () & ~mask (), with a new function (get_known_nonzero_bits
or get_certain_nonzero_bits etc.) which handles that.
Anyway, the following patch doesn't do what I wrote above just yet,
for that single pattern it is just a missed optimization.
But the with_possible_nonzero_bits2 uses in the 3 new simplifiers are
just completely incorrect, because they don't just use the @0 operand
in get_nonzero_bits (pessimizing stuff if value ranges are fully computed),
but also use it in the replacement, then they act as if the BIT_AND_EXPR
wasn't there at all.
While we could use (with_possible_nonzero_bits2@3 @0) and use
get_nonzero_bits (@0) and use @3 in the replacement, that would still
often be a pessimization, so I've just used with_possible_nonzero_bits@0.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/117420
* match.pd ((X >> C1) << (C1 + C2) -> X << C2,
(X >> C1) * (C2 << C1) -> X * C2, X / (1 << C) -> X /[ex] (1 << C)):
Use with_possible_nonzero_bits@0 rather than
(with_possible_nonzero_bits2 @0).
Jakub Jelinek [Fri, 22 Nov 2024 18:47:52 +0000 (19:47 +0100)]
c-family: Yet another fix for _BitInt & __sync_* builtins [PR117641]
Sorry, the last patch only partially fixed the __sync_* ICEs with
_BitInt(128) on ia32.
Even for !fetch we need to error out and return 0. I was afraid of
APIs like __atomic_exchange/__atomic_compare_exchange, those obviously
need to be supported even on _BitInt(128) on ia32, but they actually never
sync_resolve_size, they are handled by adding the size argument and using
the library version much earlier.
For fetch && !orig_format (i.e. __atomic_fetch_* etc.) we need to return -1
so that we handle it with a manualy __atomic_load +
__atomic_compare_exchange loop in the caller, all other cases should
be rejected.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR c/117641
* c-common.cc (sync_resolve_size): For size 16 with _BitInt
on targets where TImode isn't supported, use goto incompatible if
!fetch.
Andrew Pinski [Fri, 22 Nov 2024 00:55:01 +0000 (16:55 -0800)]
libsanitizer: Move language level from gnu++14 to gnu++17
While compiling libsanitizer for aarch64-linux-gnu, I noticed the new warning:
```
../../../../libsanitizer/asan/asan_interceptors.cpp: In function ‘char* ___interceptor_strcpy(char*, const char*)’:
../../../../libsanitizer/asan/asan_interceptors.cpp:554:6: warning: ‘if constexpr’ only available with ‘-std=c++17’ or ‘-std=gnu++17’ [-Wc++17-extensions]
554 | if constexpr (SANITIZER_APPLE) {
| ^~~~~~~~~
```
So compile-rt upstream compiles this as gnu++17 (the current defualt for clang), so let's update it
to be similar.
The DejaGnu routine "riscv_get_arch" fails to infer the correct
architecture string when GCC is built for RV32EC. This causes invalid
architecture string to be produced by "add_options_for_riscv_v":
xgcc: error: '-march=rv32cv': first ISA subset must be 'e', 'i' or 'g'
Fix by adding the E base ISA variant to the list of possible architecture
modifiers.
Also, the V extension is added to the machine string without checking
whether dependent extensions are available. This results in errors when
GCC is built for RV32EC:
Executing on host: .../xgcc ... -march=rv32ecv ...
cc1: error: ILP32E ABI does not support the 'D' extension
cc1: sorry, unimplemented: Currently the 'V' implementation requires the 'M' extension
Fix by disabling vector tests for RISC-V if V extension cannot be added
to current architecture.
Tested riscv32-none-elf for -march=rv32ec using GNU simulator. Most of
the remaining failures are due to explicit addition of vector options,
yet missing "dg-require-effective-target riscv_v_ok":
=== gcc Summary ===
# of expected passes 211958
# of unexpected failures 1826
# of expected failures 1059
# of unresolved testcases 5209
# of unsupported tests 15513
Ensured riscv64-unknown-linux-gnu tested with qemu has no new passing or
failing tests, before and after applying this patch:
# of expected passes 237209
# of unexpected failures 335
# of expected failures 1670
# of unresolved testcases 43
# of unsupported tests 16767
PR target/117603
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (riscv_get_arch): Add comment about
function purpose. Add E ISA to list of possible
modifiers.
(check_vect_support_and_set_flags): Do not advertise vector
support if V extension cannot be enabled.
Add middle end support for the 'interop' directive and the 'init', 'use',
and 'destroy' clauses - but fail with a sorry, unimplemented in gimplify.cc.
For Fortran, generate the tree code, update the internal representation,
add some more diagnostic checks and update for newer specification changes
('fr' only takes a single value, but it integer expressions are permitted
again [like with the old syntax] not only constant identifiers).
For C and C++, this patch adds the full parser support for 'interop'.
Still missing is actually handling the directive in the middle end and
in libgomp.
The GOMP_INTEROP_IFR_* internal values have been changed to have space
for vendor specific values that are adjacent to the existing values
but negative, if needed.
gcc/c-family/ChangeLog:
* c-common.h (enum c_omp_region_type): Add C_ORT_INTEROP
and C_ORT_OMP_INTEROP.
(c_omp_interop_t_p): New prototype.
* c-omp.cc (c_omp_interop_t_p): Check whether the type is
omp_interop_t.
(c_omp_directives): Uncomment 'interop'.
* c-pragma.cc (omp_pragmas): Add 'interop'.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_INTEROP.
(enum pragma_omp_clause): Add init, use, and destroy clauses.
gcc/c/ChangeLog:
* c-parser.cc (INCLUDE_STRING): Define.
(c_parser_pragma): Handle 'interop' directive.
(c_parser_omp_clause_name): Handle init, use, and destroy clauses.
(c_parser_omp_all_clauses): Likewise; use C_ORT_OMP_INTEROP, if
'use' is permitted, for c_finish_omp_clauses.
(c_parser_omp_clause_destroy, c_parser_omp_modifier_prefer_type,
c_parser_omp_clause_init, c_parser_omp_clause_use,
OMP_INTEROP_CLAUSE_MASK, c_parser_omp_interop): New.
* c-typeck.cc (c_finish_omp_clauses): Add missing OPT_Wopenmp to
a warning; handle new clauses.
gcc/cp/ChangeLog:
* parser.cc (INCLUDE_STRING): Define.
(cp_parser_omp_clause_name): Handle init, use, and destroy clauses.
(cp_parser_omp_all_clauses): Likewise; use C_ORT_OMP_INTEROP, if
'use' is permitted, for c_finish_omp_clauses.
(cp_parser_omp_modifier_prefer_type, cp_parser_omp_clause_init,
OMP_INTEROP_CLAUSE_MASK, cp_parser_omp_interop): New.
(cp_parser_pragma): Handle 'interop' directive.
* pt.cc (tsubst_omp_clauses): Handle init, use, and destroy clauses.
(tsubst_stmt): Handle OMP_INTEROP.
* semantics.cc (cp_omp_init_prefer_type_update): New.
(finish_omp_clauses): Handle init, use, and destroy clauses
and add clause check for 'depend' on 'interop'.
gcc/fortran/ChangeLog:
* gfortran.h (gfc_omp_namelist): Cleanup interop internal
representation.
* dump-parse-tree.cc (show_omp_namelist): Update for changed
internal representation.
* match.cc (gfc_free_omp_namelist): Likewise.
* openmp.cc (gfc_match_omp_prefer_type, gfc_match_omp_init):
Likewise; also handle some corner cases better and update for
newer 6.0 changes related to 'fr'.
(resolve_omp_clauses): Add type-check for interop variables.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle init, use
and destroy clauses.
(gfc_trans_openmp_interop): New.
(gfc_trans_omp_directive): Call it.
gcc/ChangeLog:
* gimplify.cc (gimplify_expr): Handle OMP_INTEROP by printing
"sorry, uninplemented".
* omp-api.h (omp_get_fr_id_from_name): Change return type to
'char'.
* omp-general.cc (omp_get_fr_id_from_name): Likewise; return
GOMP_INTEROP_IFR_UNKNOWN not 0 if not found.
(omp_get_name_from_fr_id): Return "<unknown>" not NULL
if not found (used for dumps).
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DESTROY,
OMP_CLAUSE_USE, and OMP_CLAUSE_INIT.
* tree-pretty-print.cc (dump_omp_init_prefer_type): New.
(dump_omp_clause): Handle init, use and destroy clauses.
(dump_generic_node): Handle interop directive.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add new
init/use/destroy clauses.
* tree.def (OACC_LOOP): Fix comment.
(OMP_INTEROP): Add.
* tree.h (OMP_INTEROP_CLAUSES, OMP_CLAUSE_INIT_TARGET,
OMP_CLAUSE_INIT_TARGETSYNC, OMP_CLAUSE_INIT_PREFER_TYPE): New.
include/ChangeLog:
* gomp-constants.h (GOMP_INTEROP_IFR_NONE): Rename ...
(GOMP_INTEROP_IFR_UNKNOWN): ... to this. And change value.
(GOMP_INTEROP_IFR_SEPARATOR): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/interop-1.f90: Update for parser changes,
spec changes and add new tests.
* gfortran.dg/gomp/interop-2.f90: Likewise.
* gfortran.dg/gomp/interop-3.f90: Likewise.
* c-c++-common/gomp/interop-1.c: New test.
* c-c++-common/gomp/interop-2.c: New test.
* c-c++-common/gomp/interop-3.c: New test.
* c-c++-common/gomp/interop-4.c: New test.
* g++.dg/gomp/interop-5.C: New test.
* gfortran.dg/gomp/interop-4.f90: New test.
Jakub Jelinek [Fri, 22 Nov 2024 10:33:34 +0000 (11:33 +0100)]
i386: Make __builtin_ia32_f{nstenv,ldenv,nstsw,fnclex} builtins internal [PR117165]
As the comment says, these builtins are meant to be internal for the atomic
support and cause various ICEs when using them directly in various
conditions.
So the following patch makes them internal.
We do have also internal-fn.*, but those target specific builtins would
need to be there in generic code, so I've just added space to their name,
which is the old way to hide builtins/attributes etc.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
PR target/117165
* config/i386/i386-builtin.def (IX86_BUILTIN_FNSTENV,
IX86_BUILTIN_FLDENV, IX86_BUILTIN_FNSTSW, IX86_BUILTIN_FNCLEX): Add
space to the end of the builtin name to make it really internal.
Jakub Jelinek [Fri, 22 Nov 2024 09:02:59 +0000 (10:02 +0100)]
testsuite: Fix up vector-{8,9,10}.c tests
On Thu, Nov 21, 2024 at 01:30:39PM +0100, Christoph Müllner wrote:
> > > * gcc.dg/tree-ssa/satd-hadamard.c: New test.
> > > * gcc.dg/tree-ssa/vector-10.c: New test.
> > > * gcc.dg/tree-ssa/vector-8.c: New test.
> > > * gcc.dg/tree-ssa/vector-9.c: New test.
I see FAILs on i686-linux or on x86_64-linux (in the latter
with -m32 testing).
One problem is that vector-10.c doesn't use -Wno-psabi option
and uses a function which returns a vector and takes vector
as first parameter, the other problems are that 3 other
tests don't arrange for at least basic vector ISA support,
plus non-standardly test only on x86_64-*-*, while normally
one would allow both i?86-*-* x86_64-*-* and if it is e.g.
specific to 64-bit, also check for lp64 or int128 or whatever
else is needed. E.g. Solaris I think has i?86-*-* triplet even
for 64-bit code, etc.
The following patch fixes these.
2024-11-22 Jakub Jelinek <jakub@redhat.com>
* gcc.dg/tree-ssa/satd-hadamard.c: Add -msse2 as dg-additional-options
on x86. Also scan-tree-dump on i?86-*-*.
* gcc.dg/tree-ssa/vector-8.c: Likewise.
* gcc.dg/tree-ssa/vector-9.c: Likewise.
* gcc.dg/tree-ssa/vector-10.c: Add -Wno-psabi to dg-additional-options.
Tamar Christina [Thu, 21 Nov 2024 15:10:24 +0000 (15:10 +0000)]
middle-end:For multiplication try swapping operands when matching complex multiply [PR116463]
This commit fixes the failures of complex.exp=fast-math-complex-mls-*.c on the
GCC 14 branch and some of the ones on the master.
The current matching just looks for one order for multiplication and was relying
on canonicalization to always give the right order because of the TWO_OPERANDS.
However when it comes to the multiplication trying only one order is a bit
fragile as they can be flipped.
The failing tests on the branch are:
void fms180snd(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
_Complex TYPE c[restrict N]) {
for (int i = 0; i < N; i++)
c[i] -= a[i] * (b[i] * I * I);
}
void fms180fst(_Complex TYPE a[restrict N], _Complex TYPE b[restrict N],
_Complex TYPE c[restrict N]) {
for (int i = 0; i < N; i++)
c[i] -= (a[i] * I * I) * b[i];
}
The issue is just a small difference in commutative operations.
we look for {R,R} * {R,I} but found {R,I} * {R,R}.
Since the DF analysis is cached, we should be able to swap operands and retry
for multiply cheaply.
There is a constraint being checked by vect_validate_multiplication for the data
flow of the operands feeding the multiplications. So e.g.
we require the lanes to come from the same source which
vect_validate_multiplication checks. As such it doesn't make sense to flip them
individually because that would invalidate the earlier linear_loads_p checks
which have validated that the arguments all come from the same datarefs.
This patch thus flips the operands in unison to still maintain this invariant,
but also honor the commutative nature of multiplication.