Define PATH_SEPARATOR and HOST_EXECUTABLE_SUFFIX in standalone MinGW
builds; the definitions normally come from GCC, and the defaults don't
work for native Windows.
Viljar Indus [Thu, 22 Feb 2024 12:27:14 +0000 (14:27 +0200)]
ada: Avoid checking parameters of protected procedures
The compiler triggers warnings on generated protected procedures
if the procedure does not have an explicit spec. Instead check
if the body was created for a protected procedure if the spec
is not present.
gcc/ada/
* sem_ch6.adb (Analyze_Subprogram_Body_Helper):
If the spec is not present for a subprogram body then
check if the body definiton was created for a protected
procedure.
Piotr Trojanek [Mon, 26 Feb 2024 08:32:20 +0000 (09:32 +0100)]
ada: Ignore ghost nodes in call graph information for dispatching calls
When emitting call graph information, we already skipped calls to
ignored ghost entities, but this code was causing crashes (in production
builds) and assertion failures (in development builds), because the
ignored ghost entities are not fully decorated, e.g. when they come from
instances of generic units with default subprograms.
With this patch we skip call graph information for ignored ghost
entities when they are registered, both as explicit calls and as
tagged types that will come with internally generated dispatching
subprograms.
gcc/ada/
* exp_cg.adb (Generate_CG_Output): Remove code for ignored ghost
entities that applied to subprogram calls.
(Register_CG_Node): Skip ignored ghost entities, both calls
and tagged types, when they are registered.
This patch fixes the reason code used by Apply_Selected_Length_Checks,
which was wrong in some cases when the check could be determined to
always fail at compile time.
Eric Botcazou [Fri, 23 Feb 2024 20:55:08 +0000 (21:55 +0100)]
ada: Propagate Program_Error from failed finalization of collection
This aligns finalization collections with finalization masters when it comes
to propagating an exception raised by the finalization of a specific object,
by always propagating Program_Error instead of the aforementioned exception.
gcc/ada/
* libgnat/s-finpri.adb (Raise_From_Controlled_Operation): New
declaration of imported procedure moved from...
(Finalize_Master): ...there.
(Finalize): Call Raise_From_Controlled_Operation instead of
Reraise_Occurrence to propagate the exception, if any.
Piotr Trojanek [Thu, 22 Feb 2024 21:26:01 +0000 (22:26 +0100)]
ada: Improve recovery from illegal occurrence of 'Old in if_expression
Fix assertion failure in developer builds which happened when the THEN
expression contains an illegal occurrence of 'Old and the type of the
THEN expression is left as Any_Type, but there is no ELSE expression.
gcc/ada/
* sem_ch4.adb (Analyze_If_Expression): Add guard for
if_expression without an ELSE part.
Piotr Trojanek [Fri, 23 Feb 2024 12:57:27 +0000 (13:57 +0100)]
ada: No need to follow New_Occurrence_Of with Set_Etype
Routine New_Occurrence_Of itself sets the Etype of its result; there is
no need to set it explicitly afterwards.
Code cleanup related to fix for attribute 'Old; semantics is unaffected.
gcc/ada/
* exp_ch13.adb (Expand_N_Free_Statement): After analysis, the
new temporary has the type of its Object_Definition and the new
occurrence of this temporary has this type as well; simplify.
* sem_util.adb
(Indirect_Temp_Value): Remove redundant call to Set_Etype;
simplify.
(Is_Access_Type_For_Indirect_Temp): Add missing body header.
Piotr Trojanek [Thu, 22 Feb 2024 21:25:16 +0000 (22:25 +0100)]
ada: Fix detection of if_expressions that are known on entry
Fix a small glitch in routine Is_Known_On_Entry, which returned False
for all if_expressions, regardless whether their conditions or dependent
expressions are known on entry.
gcc/ada/
* sem_util.adb (Is_Known_On_Entry): Check whether condition and
dependent expressions of an if_expression are known on entry.
Checks.Get_Ranged_Checks was onced named Range_Check, and a few
comments referred to it by that name before this commit. To avoid
confusion with Types.Range_Check, this commits fixes those comments.
Eric Botcazou [Thu, 22 Feb 2024 07:47:42 +0000 (08:47 +0100)]
ada: Minor performance improvement for dynamically-allocated controlled objects
The values returned by Header_Alignment and Header_Size are known at compile
time and powers of two on almost all platforms, so inlining them by means of
an expression function improves the object code generated for alignment and
size calculations involving them.
gcc/ada/
* libgnat/s-finpri.ads: Add use type clause for Storage_Offset.
(Header_Alignment): Turn into an expression function.
(Header_Size): Likewise.
* libgnat/s-finpri.adb: Remove use type clause for Storage_Offset.
(Header_Alignment): Delete.
(Header_Size): Likewise.
Marc Poulhiès [Tue, 13 Feb 2024 11:20:19 +0000 (12:20 +0100)]
ada: Fixup one more pattern of broken scope information
When an array's initialization contains a `others =>` clause with an
expression that involves finalization, the resulting scope information
is incorrect and can cause crashes with backend (i.e. gnat-llvm) that
also use unnesting. The observable symptom is a nested object
declaration (created by the compiler) within a loop wrapped in a
procedure created by the unnester that has incoherent scope information:
its Scope field points to the scope of the procedure (1 level too high)
and is contained in the entity chain of some entity nested in the
procedure (correct).
The correct solution would be to fix the scope information when it is
created, but this revealed too large of a task with many interaction
with existing code.
This change adds another pattern to the Fixup_Inner_Scopes procedure to
detect the problematic case and fix the scope, "after the facts".
gcc/ada/
* exp_ch7.adb (Unnest_Loop::Fixup_Inner_Scopes): detect a new
problematic pattern and fixup the scope accordingly.
Eric Botcazou [Wed, 21 Feb 2024 20:48:13 +0000 (21:48 +0100)]
ada: Fix latent alignment issue for dynamically-allocated controlled objects
Dynamically-allocated controlled objects are attached to a finalization
collection by means of a hidden header placed right before the object,
which means that the size effectively allocated must naturally account
for the size of this header. But the allocation must also account for
the alignment of this header in order to have it properly aligned.
gcc/ada/
* libgnat/s-finpri.ads (Header_Alignment): New function.
(Header_Size): Adjust description.
(Master_Node): Put Finalize_Address as first component.
(Collection_Node): Likewise.
* libgnat/s-finpri.adb (Header_Alignment): New function.
(Header_Size): Return the object size in storage units.
* libgnat/s-stposu.ads (Adjust_Controlled_Dereference): Replace
collection node with header in description.
* libgnat/s-stposu.adb (Adjust_Controlled_Dereference): Likewise.
(Allocate_Any_Controlled): Likewise. Pass the maximum of the
specified alignment and that of the header to the allocator.
(Deallocate_Any_Controlled): Likewise to the deallocator.
Viljar Indus [Fri, 9 Feb 2024 10:29:41 +0000 (12:29 +0200)]
ada: Fix resolving tagged operations in array aggregates
In the Two_Pass_Aggregate_Expansion we were removing
all of the entity links in the Iterator_Specification
to avoid reusing the same Iterator_Definition in both
loops.
However this approach was also breaking the links to
calls with dot notation that had been transformed to
the regular call notation.
In order to circumvent this, explicitly create new
identifier definitions when copying the
Iterator_Specfications for both of the loops.
gcc/ada/
* exp_aggr.adb (Two_Pass_Aggregate_Expansion):
Explicitly create new Defining_Iterators for both
of the loops.
Eric Botcazou [Tue, 20 Feb 2024 21:40:47 +0000 (22:40 +0100)]
ada: Fix bogus error on function returning noncontrolling result in private part
This occurs in the additional case of RM 3.9.3(10) in Ada 2012, that is to
say the access controlling result, because the implementation does not use
the same (correct) conditions as in the original case.
This factors out these conditions and uses them in both cases, as well as
adjusts the wording of the message in the first case.
gcc/ada/
* sem_ch6.adb (Check_Private_Overriding): Implement the second part
of RM 3.9.3(10) consistently in both cases.
Piotr Trojanek [Wed, 21 Feb 2024 11:14:48 +0000 (12:14 +0100)]
ada: Fix casing of CUDA in error messages
Error messages now capitalize CUDA.
gcc/ada/
* erroutc.adb (Set_Msg_Insertion_Reserved_Word): Fix casing for
CUDA appearing in error message strings.
(Set_Msg_Str): Likewise for CUDA being a part of a Name_Id.
This commit makes the emission of -gnatw_q warnings pass node information
so as to handle the enclosing subprogram display of -gnatdJ instead of
crashing.
gcc/ada/
* exp_ch4.adb (Expand_Composite_Equality): Call Error_Msg_N
instead of Error_Msg.
Steve Baird [Sat, 17 Feb 2024 01:05:09 +0000 (17:05 -0800)]
ada: Follow up fixes for Put_Image/streaming regressions
A recent change to reduce duplication of compiler-generated Put_Image and
streaming subprograms introduced some regressions. The fix for one of them
was incomplete.
gcc/ada/
* exp_attr.adb (Build_And_Insert_Type_Attr_Subp): Further tweaking
of the point where a compiler-generated Put_Image or streaming
subprogram is to be inserted in the tree. If one such subprogram
calls another (as is often the case with, for example, Put_Image
procedures for composite type and for a component type thereof),
then we want to avoid use-before-definition problems that can
result from inserting the caller ahead of the callee.
This changes the implementation of finalization collections from using the
global task lock to using per-collection spinlocks. Spinlocks are a good
fit in this context because they are very cheap and therefore can be taken
with a fine granularity only around the portions of code implementing the
shuffling of pointers required by attachment and detachment actions.
gcc/ada/
* libgnat/s-finpri.ads (Lock_Type): New modular type.
(Collection_Node): Add Enclosing_Collection component.
(Finalization_Collection): Add Lock component.
* libgnat/s-finpri.adb: Add clauses for System.Atomic_Primitives.
(Attach_Object_To_Collection): Lock and unlock the collection.
Save a pointer to the enclosing collection in the node.
(Detach_Object_From_Collection): Lock and unlock the collection.
(Finalize): Likewise.
(Initialize): Initialize the lock.
(Lock_Collection): New procedure.
(Unlock_Collection): Likewise.
Steve Baird [Thu, 15 Feb 2024 22:49:18 +0000 (14:49 -0800)]
ada: Formal_Derived_Type'Size is not static
In deciding whether a Size attribute reference is static, the compiler could
get confused about whether an implicitly-declared subtype of a generic formal
type is itself a generic formal type, possibly resulting in an assertion
failure and then a bugbox.
gcc/ada/
* sem_attr.adb (Eval_Attribute): Expand existing checks for
generic formal types for which Is_Generic_Type returns False. In
that case, mark the attribute reference as nonstatic.
Steve Baird [Thu, 15 Feb 2024 23:13:12 +0000 (15:13 -0800)]
ada: Fix bug in maintaining dimension info
Copying a node does not automatically propagate its associated dimension
information (if any). This must be done explicitly.
gcc/ada/
* sem_util.adb (Copy_Node_With_Replacement): Add call to
Copy_Dimensions so that any dimension information associated with
the copied node is also associated with the resulting copy.
Piotr Trojanek [Fri, 16 Feb 2024 15:57:10 +0000 (16:57 +0100)]
ada: Fix casing in error messages
Error messages should not start with a capital letter.
gcc/ada/
* gnat_cuda.adb (Remove_CUDA_Device_Entities): Fix casing
(this primarily fixes a style, because the capitalization will
not be preserved by the error-reporting machinery anyway).
* sem_ch13.adb (Analyze_User_Aspect_Aspect_Specification): Fix
casing in error message.
David Malcolm [Thu, 16 May 2024 01:22:52 +0000 (21:22 -0400)]
diagnostics: use unicode art for interprocedural depth
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected
output to use unicode for depth indication.
* gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c: Likewise.
gcc/ChangeLog:
* text-art/theme.cc (ascii_theme::get_cppchar): Add
cell_kind::INTERPROCEDURAL_*.
(unicode_theme::get_cppchar): Likewise.
* text-art/theme.h (theme::cell_kind): Likewise.
* tree-diagnostic-path.cc:
(thread_event_printer::print_swimlane_for_event_range): Use the
above to get characters for indicating interprocedural stack
depth activity, falling back to ascii.
(selftest::test_interprocedural_path_1): Test with both ascii
and unicode themes.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Thu, 16 May 2024 01:22:51 +0000 (21:22 -0400)]
diagnostics: add warning emoji to events with VERB_danger
Tweak the printing of -fdiagnostics-path-format=inline-events so that
any event with diagnostic_event::VERB_danger gains a warning emoji,
provided that the text art theme enables emoji support.
VERB_danger is set by the analyzer on the last event in a path, and so
this emoji appears at the end of all analyzer execution paths
highlighting the location of the problem.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected
output to include warning emoji.
* gcc.dg/analyzer/warning-emoji.c: New test.
gcc/ChangeLog:
* tree-diagnostic-path.cc: Include "text-art/theme.h".
(path_label::get_text): If the event has
diagnostic_event::VERB_danger, and the theme enables emojis, then
add a warning emoji between the event number and the event text.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Thu, 16 May 2024 01:22:51 +0000 (21:22 -0400)]
diagnostics: simplify output of purely intraprocedural execution paths
Diagnostic path printing was added in r10-5901-g4bc1899b2e883f. As of
that commit, with -fdiagnostics-path-format=inline-events (the default),
we print a vertical line to the left of the source line numbering,
visualizing the stack depth and interprocedural calls and returns as
indentation changes.
For cases where the events on a thread are purely interprocedural, this
line does nothing except take up space and complicate the output.
This patch adds logic to omit it for such cases, simpifying the output,
and, I believe, improving readability.
gcc/ChangeLog:
* diagnostic-path.h: Update leading comment to reflect
intraprocedural cases. Fix typo in comment.
* doc/invoke.texi: Update intraprocedural example.
gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/allocation-size-multiline-1.c: Update
expected results for purely intraprocedural path.
* c-c++-common/analyzer/allocation-size-multiline-2.c: Likewise.
* c-c++-common/analyzer/allocation-size-multiline-3.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-0.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-1.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-2.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-3.c: Likewise.
* c-c++-common/analyzer/malloc-macro-inline-events.c: Likewise.
Doing so for this file requires a rewrite since the paths
prefixing the "in expansion of macro" lines become the only thing
on their line and so are no longer pruned by multiline.exp logic
for pruning extra content on non-blank lines.
* c-c++-common/analyzer/malloc-paths-9-noexcept.c: Likewise.
* c-c++-common/analyzer/setjmp-2.c: Likewise.
* gcc.dg/analyzer/malloc-paths-9.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-multiline-2.c: Likewise.
* gcc.dg/plugin/diagnostic-test-paths-2.c: Likewise.
gcc/ChangeLog:
* tree-diagnostic-path.cc (per_thread_summary::interprocedural_p):
New.
(thread_event_printer::print_swimlane_for_event_range): Don't
indent and print the stack depth line if this thread's events are
purely intraprocedural.
(selftest::test_intraprocedural_path): Update expected output.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Thu, 16 May 2024 01:22:51 +0000 (21:22 -0400)]
diagnostics: handle SGR codes in line_label::m_display_width
gcc/ChangeLog:
* diagnostic-show-locus.cc: Define INCLUDE_VECTOR and include
"text-art/types.h".
(line_label::line_label): Drop "policy" argument. Use
styled_string::calc_canvas_width when computing m_display_width,
as this skips SGR codes.
(layout::print_any_labels): Update for line_label ctor change.
(selftest::test_one_liner_labels_utf8): Update expected text to
reflect that the labels can fit on one line if we don't get
confused by SGR colorization codes.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Xiao Zeng [Wed, 15 May 2024 02:03:40 +0000 (10:03 +0800)]
RISC-V: Add Zvfbfwma extension to the -march= option
This patch would like to add new sub extension (aka Zvfbfwma) to the
-march= option. It introduces a new data type BF16.
1 In spec: "Zvfbfwma requires the Zvfbfmin extension and the Zfbfmin extension."
1.1 In Embedded Processor: Zvfbfwma -> Zvfbfmin -> Zve32f
1.2 In Application Processor: Zvfbfwma -> Zvfbfmin -> V
1.3 In both scenarios, there are: Zvfbfwma -> Zfbfmin
4 Depending on different usage scenarios, the Zvfbfwma extension may
depend on 'V' or 'Zve32f'. This patch only implements dependencies in
scenario of Embedded Processor. This is consistent with the processing
strategy in Zvfbfmin. In scenario of Application Processor, it is
necessary to explicitly indicate the dependent 'V' extension.
5 You can locate more information about Zvfbfwma from below spec doc:
<https://github.com/riscv/riscv-bfloat16/releases/download/v59042fc71c31a9bcb2f1957621c960ed36fac401/riscv-bfloat16.pdf>
* gcc.target/riscv/arch-37.c: New test.
* gcc.target/riscv/arch-38.c: New test.
* gcc.target/riscv/predef-36.c: New test.
* gcc.target/riscv/predef-37.c: New test.
Although there's a memory load from constant pool, but it should be
better when it's inside a loop. The load from constant pool can be
hoist out. it's 1 instruction vs 4 instructions.
liuhongt [Tue, 14 May 2024 10:39:54 +0000 (18:39 +0800)]
Optimize ashift >> 7 to vpcmpgtb for vector int8.
Since there is no corresponding instruction, the shift operation for
vector int8 is implemented using the instructions for vector int16,
but for some special shift counts, it can be transformed into vpcmpgtb.
GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:
my_mem_cmp_aligned_15:
li a4,0
j .L2
.L8:
bgeu a4,a7,.L7
.L2:
add a2,a0,a4
add a3,a1,a4
lbu a5,0(a2)
lbu a6,0(a3)
addi a4,a4,1
li a7,15 // missed hoisting
subw a5,a5,a6
andi a5,a5,0xff // useless
beq a5,zero,.L8
lbu a0,0(a2) // loading again!
lbu a5,0(a3) // loading again!
subw a0,a0,a5
ret
.L7:
li a0,0
ret
Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns
Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)
When applying these improvements we get the following sequence:
Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns
This patch implements this improvements.
The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).
Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).
Bootstrapped and SPEC CPU 2017 tested.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.
Marek Polacek [Wed, 8 May 2024 19:43:58 +0000 (15:43 -0400)]
c++: ICE with reference NSDMI [PR114854]
Here we crash on a cp_gimplify_expr/TARGET_EXPR assert:
/* A TARGET_EXPR that expresses direct-initialization should have been
elided by cp_gimplify_init_expr. */
gcc_checking_assert (!TARGET_EXPR_DIRECT_INIT_P (*expr_p));
the TARGET_EXPR in question is created for the NSDMI in:
class Vector { int m_size; };
struct S {
const Vector &vec{};
};
where we first need to create a Vector{} temporary, and then bind the
vec reference to it. The temporary is represented by a TARGET_EXPR
and it cannot be elided. When we create an object of type S, we get
Marek Polacek [Tue, 13 Feb 2024 00:36:16 +0000 (19:36 -0500)]
c++: DR 569, DR 1693: fun with semicolons [PR113760]
Prompted by c++/113760, I started looking into a bogus "extra ;"
warning in C++11. It quickly turned out that if I want to fix
this for good, the fix will not be so small.
This patch touches on DR 569, an extra ; at namespace scope should
be allowed since C++11:
struct S {
};
; // pedwarn in C++98
It also touches on DR 1693, which allows superfluous semicolons in
class definitions since C++11:
struct S {
int a;
; // pedwarn in C++98
};
Note that a single semicolon is valid after a member function definition:
struct S {
void foo () {}; // only warns with -Wextra-semi
};
There's a new function maybe_warn_extra_semi to handle all of the above
in a single place. So now they all get a fix-it hint.
-Wextra-semi turns on all "extra ;" diagnostics. Currently, options
like -Wc++11-compat or -Wc++11-extensions are not considered.
* g++.dg/diagnostic/semicolon1.C: New test.
* g++.dg/diagnostic/semicolon10.C: New test.
* g++.dg/diagnostic/semicolon11.C: New test.
* g++.dg/diagnostic/semicolon12.C: New test.
* g++.dg/diagnostic/semicolon13.C: New test.
* g++.dg/diagnostic/semicolon14.C: New test.
* g++.dg/diagnostic/semicolon15.C: New test.
* g++.dg/diagnostic/semicolon16.C: New test.
* g++.dg/diagnostic/semicolon17.C: New test.
* g++.dg/diagnostic/semicolon2.C: New test.
* g++.dg/diagnostic/semicolon3.C: New test.
* g++.dg/diagnostic/semicolon4.C: New test.
* g++.dg/diagnostic/semicolon5.C: New test.
* g++.dg/diagnostic/semicolon6.C: New test.
* g++.dg/diagnostic/semicolon7.C: New test.
* g++.dg/diagnostic/semicolon8.C: New test.
* g++.dg/diagnostic/semicolon9.C: New test.
Jakub Jelinek [Wed, 15 May 2024 16:50:11 +0000 (18:50 +0200)]
c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]
This patch reworks the cdtor alias optimization, such that we can create
aliases even when maybe_clone_body is called not at at_eof time, without trying
to repeat it in maybe_optimize_cdtor.
2024-05-15 Jakub Jelinek <jakub@redhat.com>
Jason Merrill <jason@redhat.com>
PR lto/113208
* cp-tree.h (maybe_optimize_cdtor): Remove.
* decl2.cc (tentative_decl_linkage): Call maybe_make_one_only
for implicit instantiations of maybe in charge ctors/dtors
declared inline.
(import_export_decl): Don't call maybe_optimize_cdtor.
(c_parse_final_cleanups): Formatting fixes.
* optimize.cc (can_alias_cdtor): Adjust condition, for
HAVE_COMDAT_GROUP && DECL_ONE_ONLY && DECL_WEAK return true even
if not DECL_INTERFACE_KNOWN.
(maybe_clone_body): Don't clear DECL_SAVED_TREE, instead set it
to void_node.
(maybe_clone_body): Remove.
* decl.cc (cxx_comdat_group): For DECL_CLONED_FUNCTION_P
functions if SUPPORTS_ONE_ONLY return DECL_COMDAT_GROUP if already
set.
* g++.dg/abi/comdat3.C: New test.
* g++.dg/abi/comdat4.C: New test.
Jakub Jelinek [Wed, 15 May 2024 16:37:17 +0000 (18:37 +0200)]
combine: Fix up simplify_compare_const [PR115092]
The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers. That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).
The following patch just disables the optimization in that case.
2024-05-15 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.
* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.
Jakub Jelinek [Wed, 15 May 2024 16:34:44 +0000 (18:34 +0200)]
openmp: Diagnose using grainsize+num_tasks clauses together [PR115103]
I've noticed that while we diagnose many other OpenMP exclusive clauses,
we don't diagnose grainsize together with num_tasks on taskloop construct
in all of C, C++ and Fortran (the implementation simply ignored grainsize
in that case) and for Fortran also don't diagnose mixing nogroup clause
with reduction clause(s).
Fixed thusly.
2024-05-15 Jakub Jelinek <jakub@redhat.com>
PR c/115103
gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Diagnose grainsize
used together with num_tasks.
gcc/cp/
* semantics.cc (finish_omp_clauses): Diagnose grainsize
used together with num_tasks.
gcc/fortran/
* openmp.cc (resolve_omp_clauses): Diagnose grainsize
used together with num_tasks or nogroup used together with
reduction.
gcc/testsuite/
* c-c++-common/gomp/clause-dups-1.c: Add 2 further expected errors.
* gfortran.dg/gomp/pr115103.f90: New test.
Richard Biener [Fri, 3 May 2024 12:04:41 +0000 (14:04 +0200)]
tree-optimization/114589 - remove profile based sink heuristics
The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.
To avoid regressing this moves the empty-latch check to cover all
sink cases.
It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change. gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9.9999...e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision. The patch adds -ffloat-store to avoid this, following
other similar testcases.
This change fixes the testcase in the PR only when using -fno-ivopts
as otherwise VRP is confused.
PR tree-optimization/114589
* tree-ssa-sink.cc (select_best_block): Remove profile-based
heuristics. Instead reject sink locations that sink
to post-dominators. Move empty latch check here from
statement_sink_location. Also consider early_bb for the
loop depth check.
(statement_sink_location): Remove superfluous check. Remove
empty latch check.
(pass_sink_code::execute): Compute/release post-dominators.
* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
excess precision when not spilling.
* g++.dg/tree-ssa/pr114589.C: New testcase.
where we fail to see the conflict between n and g after the first
clobber of g. Before the sinking improvement there was a conflict
recorded on a path where _65/_44 are unused, so the real conflict
was missed but the fake one avoided the miscompile.
The following handles PHI defs in add_scope_conflicts_2 which
fixes the issue.
PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): Handle PHIs
by recursing to their arguments.
Gaius Mulley [Wed, 15 May 2024 15:58:21 +0000 (16:58 +0100)]
PR modula2/115057 TextIO.ReadRestLine raises an exception when buffer is exceeded
TextIO.ReadRestLine will raise an "attempting to read beyond end of file"
exception if the buffer is exceeded. This bug is caused by the
TextIO.ReadRestLine calling IOChan.Skip without a preceeding IOChan.Look.
The Look procedure will update the status result whereas
Skip always sets read result to allRight.
gcc/m2/ChangeLog:
PR modula2/115057
* gm2-libs-iso/TextIO.mod (ReadRestLine): Use ReadChar to
skip unwanted characters as this calls IOChan.Look and updates
the cid result status. A Skip without a Look does not update
the status. Skip always sets read result to allRight.
* gm2-libs-iso/TextUtil.def (SkipSpaces): Improve comments.
(CharAvailable): Improve comments.
* gm2-libs-iso/TextUtil.mod (SkipSpaces): Improve comments.
(CharAvailable): Improve comments.
gcc/testsuite/ChangeLog:
PR modula2/115057
* gm2/isolib/run/pass/testrestline.mod: New test.
* gm2/isolib/run/pass/testrestline2.mod: New test.
* gm2/isolib/run/pass/testrestline3.mod: New test.
We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Wilco Dijkstra [Wed, 15 May 2024 12:07:27 +0000 (13:07 +0100)]
AArch64: Use UZP1 instead of INS
Use UZP1 instead of INS when combining low and high halves of vectors.
UZP1 has 3 operands which improves register allocation, and is faster on
some microarchitectures.
gcc:
* config/aarch64/aarch64-simd.md (aarch64_combine_internal<mode>):
Use UZP1 instead of INS.
(aarch64_combine_internal_be<mode>): Likewise.
gcc/testsuite:
* gcc.target/aarch64/ldp_stp_16.c: Update to check for UZP1.
* gcc.target/aarch64/pr109072_1.c: Likewise.
* gcc.target/aarch64/vec-init-14.c: Likewise.
* gcc.target/aarch64/vec-init-9.c: Likewise.
Jan Hubicka [Wed, 15 May 2024 12:14:27 +0000 (14:14 +0200)]
Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA
while building more testcases for ipa-icf I noticed that there are two places
in aliasing code where we still compare TYPE_MAIN_VARIANT for pointer equality.
This is not good idea for LTO since type merging may not happen for example
when in one unit pointed to type is forward declared while in other it is fully
defined. We have same_type_for_tbaa for that.
Bootstrapped/regtested x86_64-linux, OK?
gcc/ChangeLog:
* alias.cc (reference_alias_ptr_type_1): Use view_converted_memref_p.
* alias.h (view_converted_memref_p): Declare.
* tree-ssa-alias.cc (view_converted_memref_p): Export.
(ao_compare::compare_ao_refs): Use same_type_for_tbaa.
As it turns out, this only happens when the Solaris linker is used; with
GNU ld the test PASSes just fine. In fact, that happens because gld
supports the lto-plugin while ld does not: in a Solaris build with gld,
the test FAILs the same way as with ld when -fno-use-linker-plugin is
passed, so this patch requires linker_plugin.
Tested on i386-pc-solaris2.11 (ld and gld) and x86_64-pc-linux-gnu.
Jonathan Wakely [Fri, 3 May 2024 19:00:08 +0000 (20:00 +0100)]
libstdc++: Fix data race in std::basic_ios::fill() [PR77704]
The lazy caching in std::basic_ios::fill() updates a mutable member
without synchronization, which can cause a data race if two threads both
call fill() on the same stream object when _M_fill_init is false.
To avoid this we can just cache the _M_fill member and set _M_fill_init
early in std::basic_ios::init, instead of doing it lazily. As explained
by the comment in init, there's a good reason for doing it lazily. When
char_type is neither char nor wchar_t, the locale might not have a
std::ctype<char_type> facet, so getting the fill character would throw
an exception. The current lazy init allows using unformatted I/O with
such a stream, because the fill character is never needed and so it
doesn't matter if the locale doesn't have a ctype<char_type> facet. We
can maintain this property by only setting the fill character in
std::basic_ios::init if the ctype facet is present at that time. If
fill() is called later and the fill character wasn't set by init, we can
get it from the stream's current locale at the point when fill() is
called (and not try to cache it without synchronization). If the stream
hasn't been imbued with a locale that includes the facet when we need
the fill() character, then throw bad_cast at that point.
This causes a change in behaviour for the following program:
std::ostringstream out;
out.imbue(loc);
auto fill = out.fill();
Previously the fill character would have been set when fill() is called,
and so would have used the new locale. This commit changes it so that
the fill character is set on construction and isn't affected by the new
locale being imbued later. This new behaviour seems to be what the
standard requires, and matches MSVC.
The new 27_io/basic_ios/fill/char/fill.cc test verifies that it's still
possible to use a std::basic_ios without the ctype<char_type> facet
being present at construction.
libstdc++-v3/ChangeLog:
PR libstdc++/77704
* include/bits/basic_ios.h (basic_ios::fill()): Do not modify
_M_fill and _M_fill_init in a const member function.
(basic_ios::fill(char_type)): Use _M_fill directly instead of
calling fill(). Set _M_fill_init to true.
* include/bits/basic_ios.tcc (basic_ios::init): Set _M_fill and
_M_fill_init here instead.
* testsuite/27_io/basic_ios/fill/char/1.cc: New test.
* testsuite/27_io/basic_ios/fill/wchar_t/1.cc: New test.
Rainer Orth [Wed, 15 May 2024 11:13:48 +0000 (13:13 +0200)]
testsuite: i386: Fix g++.target/i386/pr97054.C on Solaris
g++.target/i386/pr97054.C currently FAILs on 64-bit Solaris/x86:
FAIL: g++.target/i386/pr97054.C -std=gnu++14 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++14 compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C -std=gnu++17 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++17 compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C -std=gnu++2a (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++2a compilation failed to produce executable
FAIL: g++.target/i386/pr97054.C -std=gnu++98 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++98 compilation failed to produce executable
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/g++.target/i386/pr97054.C:49:20: error: frame pointer required, but reserved
Since Solaris/x86 defaults to -fno-omit-frame-pointer, this patch
explicitly builds with -fomit-frame-pointer as is the default on other
x86 targets.
Tested on i386-pc-solaris2.11 (32 and 64-bit) and x86_64-pc-linux-gnu.
RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight
The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).
Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).
This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).
The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.
gcc/ChangeLog:
* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
A recent patch added the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.
The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
RISC-V: Allow unaligned accesses in cpymemsi expansion
The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.
The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.
Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.
The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.
The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.
gcc/ChangeLog:
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).
As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.
Let's add some test cases to document the current behaviour
and to have tests to identify regressions.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.
Aldy Hernandez [Tue, 14 May 2024 14:21:50 +0000 (16:21 +0200)]
[prange] Default pointers_handled_p() to false.
The pointers_handled_p() method is an internal range-op helper to help
catch dispatch type mismatches for pointer operands. This is what
caught the IPA mismatch in PR114985.
This method is only a temporary measure to catch any incompatibilities
in the current pointer range-op entries. This patch returns true for
any *new* entries in the range-op table, as the current ones are
already fleshed out. This keeps us from having to implement this
boilerplate function for any new range-op entries.
PR tree-optimization/114995
* range-op-ptr.cc (range_operator::pointers_handled_p): Default to true.
Jonathan Wakely [Thu, 11 Apr 2024 18:12:48 +0000 (19:12 +0100)]
libstdc++: Give std::memory_order a fixed underlying type [PR89624]
Prior to C++20 this enum type doesn't have a fixed underlying type,
which means it can be modified by -fshort-enums, which then means the
HLE bits are outside the range of valid values for the type.
As it has a fixed type of int in C++20 and later, do the same for
earlier standards too. This is technically a change for C++17 down,
because the implicit underlying type (without -fshort-enums) was
unsigned before. I doubt it matters in practice. That incompatibility
already exists between C++17 and C++20 and nobody has noticed or
complained. Now at least the underlying type will be int for all -std
modes.
libstdc++-v3/ChangeLog:
PR libstdc++/89624
* include/bits/atomic_base.h (memory_order): Use int as
underlying type.
* testsuite/29_atomics/atomic/89624.cc: New test.
Andrew Pinski [Tue, 14 May 2024 13:29:18 +0000 (06:29 -0700)]
tree-cfg: Move the returns_twice check to be last statement only [PR114301]
When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.
Bootstrapped and tested pon x86_64-linux-gnu with no regressions.
PR tree-optimization/114301
gcc/ChangeLog:
* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jeff Law [Wed, 15 May 2024 04:50:15 +0000 (22:50 -0600)]
[committed] Fix rv32 issues with recent zicboz work
I should have double-checked the CI system before pushing Christoph's patches
for memset-zero. While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.
Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.
The test would need a revamp for rv32 as the expected output is all rv64 code
using "sd" instructions. I'm just not vested deeply enough into rv32 to adjust
the test to work in that environment though it should be fairly trivial to copy
the test and provide new expected output if someone cares enough.
Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O1 (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O1 (test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O2 (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O2 (test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O3 -g (internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c -O3 -g (test for excess errors)
And after the ICE is fixed, these are eliminated by only running the test for
rv64:
Levy Hsu [Thu, 9 May 2024 08:50:56 +0000 (16:50 +0800)]
x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code generation for configurations
supporting SSE2.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Best
Levy
gcc/ChangeLog:
PR target/107563
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): Call expand_vec_perm_psrlw_psllw_por.
gcc/testsuite/ChangeLog:
PR target/107563
* g++.target/i386/pr107563-a.C: New test.
* g++.target/i386/pr107563-b.C: New test.
Patrick Palka [Wed, 15 May 2024 02:55:16 +0000 (22:55 -0400)]
c++: lvalueness of non-dependent assignment expr [PR114994]
r14-4111-g6e92a6a2a72d3b made us check non-dependent simple assignment
expressions ahead of time and give them a type, as was already done for
compound assignments. Unlike for compound assignments however, if a
simple assignment resolves to an operator overload we represent it as a
(typed) MODOP_EXPR instead of a CALL_EXPR to the selected overload.
(I reckoned this was at worst a pessimization -- we'll just have to repeat
overload resolution at instantiatiation time.)
But this turns out to break the below testcase ultimately because
MODOP_EXPR (of non-reference type) is always treated as an lvalue
according to lvalue_kind, which is incorrect for the MODOP_EXPR
representing x=42.
We can fix this by representing such class assignment expressions as
CALL_EXPRs as well, but this turns out to require some tweaking of our
-Wparentheses warning logic and may introduce other fallout making it
unsuitable for backporting.
So this patch instead fixes lvalue_kind to consider the type of a
MODOP_EXPR representing a class assignment.
PR c++/114994
gcc/cp/ChangeLog:
* tree.cc (lvalue_kind) <case MODOP_EXPR>: For a class
assignment, consider the result type.
Jeff Law [Wed, 15 May 2024 00:17:59 +0000 (18:17 -0600)]
[to-be-committed,RISC-V] Remove redundant AND in shift-add sequence
So this patch allows us to eliminate an redundant AND in some shift-add
style sequences. I think the testcase was reduced from xz by the RAU
team, but I'm not highly confident of that.
Specifically the AND is masking off the upper 32 bits of the un-shifted
value and there's an outer SIGN_EXTEND from SI to DI. However in the
RTL it's working on the post-shifted value, so the constant is left
shifted, so we have to account for that in the pattern's condition.
We can just drop the AND in this case. So instead we do a 64bit shift,
then a sign extending ADD utilizing the low part of that 64bit shift result.
This has run through Ventana's CI as well as my own. I'll wait for it
to run through the larger CI system before pushing.
Jeff
gcc/
* config/riscv/riscv.md: Add pattern for sign extended shift-add
sequence with a masked input.
Simon Martin [Mon, 6 May 2024 13:20:10 +0000 (15:20 +0200)]
c++: ICE in build_deduction_guide for invalid template [PR105760]
We currently ICE upon the following invalid snippet because we fail to
properly handle tsubst_arg_types returning error_mark_node in
build_deduction_guide.
== cut ==
template<class... Ts, class>
struct A { A(Ts...); };
A a;
== cut ==
This patch fixes this, and has been successfully tested on x86_64-pc-linux-gnu.
PR c++/105760
gcc/cp/ChangeLog:
* pt.cc (build_deduction_guide): Check for error_mark_node
result from tsubst_arg_types.
Dimitar Dimitrov [Mon, 13 May 2024 16:24:14 +0000 (19:24 +0300)]
pru: Implement TARGET_CLASS_LIKELY_SPILLED_P to fix PR115013
Commit r15-436-g44e7855e did not fix PR115013 for PRU because
SMALL_REGISTER_CLASS_P is not returning an accurate value for the PRU
backend.
Word mode for PRU backend is defined as 8-bit, yet all ALU operations
are preferred in 32-bit mode. Thus checking whether a register class
contains a single word_mode register would not classify the actually
single SImode register classes as small. This affected the
multiplication source and destination register classes.
Fix by implementing TARGET_CLASS_LIKELY_SPILLED_P to treat all register
classes with SImode or smaller size as likely spilled. This in turn
corrects the behaviour of SMALL_REGISTER_CLASS_P for PRU.
PR rtl-optimization/115013
gcc/ChangeLog:
* config/pru/pru.cc (pru_class_likely_spilled_p): Implement
to mark classes containing one SImode register as likely
spilled.
(TARGET_CLASS_LIKELY_SPILLED_P): Define.
Vineet Gupta [Mon, 13 May 2024 18:45:55 +0000 (11:45 -0700)]
RISC-V: avoid LUI based const materialization ... [part of PR/106265]
... if the constant can be represented as sum of two S12 values.
The two S12 values could instead be fused with subsequent ADD insn.
The helps
- avoid an additional LUI insn
- side benefits of not clobbering a reg
e.g.
w/o patch w/ patch
long | |
plus(unsigned long i) | li a5,4096 |
{ | addi a5,a5,-2032 | addi a0, a0, 2047
return i + 2064; | add a0,a0,a5 | addi a0, a0, 17
} | ret | ret
NOTE: In theory not having const in a standalone reg might seem less
CSE friendly, but for workloads in consideration these mat are
from very late LRA reloads and follow on GCSE is not doing much
currently.
The real benefit however is seen in base+offset computation for array
accesses and especially for stack accesses which are finalized late in
optim pipeline, during LRA register allocation. Often the finalized
offsets trigger LRA reloads resulting in mind boggling repetition of
exact same insn sequence including LUI based constant materialization.
This shaves off 290 billion dynamic instrustions (QEMU icounts) in
SPEC 2017 Cactu benchmark which is over 10% of workload. In the rest of
suite, there additional 10 billion shaved, with both gains and losses
in indiv workloads as is usual with compiler changes.
This should still be considered damage control as the real/deeper fix
would be to reduce number of LRA reloads or CSE/anchor those during
LRA constraint sub-pass (re)runs (thats a different PR/114729.
Implementation Details (for posterity)
--------------------------------------
- basic idea is to have a splitter selected via a new predicate for constant
being possible sum of two S12 and provide the transform.
This is however a 2 -> 2 transform which combine can't handle.
So we specify it using a define_insn_and_split.
- the initial loose "i" constraint caused LRA to accept invalid insns thus
needing a tighter new constraint as well.
- An additional fallback alternate with catch-all "r" register
constraint also needed to allow any "reloads" that LRA might
require for ADDI with const larger than S12.
Testing
--------
This is testsuite clean (rv64 only).
I'll rely on post-commit CI multlib run for any possible fallout for
other setups such as rv32.
I also threw this into a buildroot run, it obviously boots Linux to
userspace. bloat-o-meter on glibc and kernel show overall decrease in
staic instruction counts with some minor spot increases.
These are generally in the category of
- LUI + ADDI are 2 byte each vs. two ADD being 4 byte each.
- Knock on effects due to inlining changes.
- Sometimes the slightly shorter 2-insn seq in a mult-exit function
can cause in-place epilogue duplication (vs. a jump back).
This is slightly larger but more efficient in execution.
In summary nothing to fret about.
Caveat:
------
Jeff noted during v2 review that the operand0 constraint !riscv_reg_frame_related
could potentially cause issues with hard reg cprop in future. If that
trips things up we will have to loosen the constraint while dialing down
the const range to (-2048 to 2032) as opposed to fll S12 range of
(-2048 to 2047) to keep stack regs aligned.
gcc/ChangeLog:
* config/riscv/riscv.h: New macros to check for sum of two S12
range.
* config/riscv/constraints.md: New constraint.
* config/riscv/predicates.md: New Predicate.
* config/riscv/riscv.md: New splitter.
* config/riscv/riscv.cc (riscv_reg_frame_related): New helper.
* config/riscv/riscv-protos.h: New helper prototype.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sum-of-two-s12-const-1.c: New test: checks
for new patterns output.
* gcc.target/riscv/sum-of-two-s12-const-2.c: Ditto.
* gcc.target/riscv/sum-of-two-s12-const-3.c: New test: should not
ICE.
Richard Biener [Tue, 14 May 2024 09:13:51 +0000 (11:13 +0200)]
tree-optimization/99954 - redo loop distribution memcpy recognition fix
The following revisits the fix for PR99954 which was observed as
causing missed memcpy recognition and instead using memmove for
non-aliasing copies. While the original fix mitigated bogus
recognition of memcpy the root cause was not properly identified.
The root cause is dr_analyze_indices "failing" to handle union
references and leaving the DRs indices in a state that's not correctly
handled by dr_may_alias. The following mitigates this there
appropriately, restoring memcpy recognition for non-aliasing copies.
This makes us run into a latent issue in ptr_deref_may_alias_decl_p
when the pointer is something like &MEM[0].a in which case we fail
to handle non-SSA name pointers. Add code similar to what we have
in ptr_derefs_may_alias_p.
PR tree-optimization/99954
* tree-data-ref.cc (dr_may_alias_p): For bases that are
not completely analyzed fall back to TBAA and points-to.
* tree-loop-distribution.cc
(loop_distribution::classify_builtin_ldst): When there
is no dependence again classify as memcpy.
* tree-ssa-alias.cc (ptr_deref_may_alias_decl_p): Verify
the pointer is an SSA name.
[PATCH 3/3] RISC-V: Add memset-zero expansion to cbo.zero
The Zicboz extension offers the cbo.zero instruction, which can be used
to clean a memory region corresponding to a cache block.
The Zic64b extension defines the cache block size to 64 byte.
If both extensions are available, it is possible to use cbo.zero
to clear memory, if the alignment and size constraints are met.
This patch implements this.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_expand_block_clear): New prototype.
* config/riscv/riscv-string.cc (riscv_expand_block_clear_zicboz_zic64b):
New function to expand a block-clear with cbo.zero.
(riscv_expand_block_clear): New RISC-V block-clear expansion function.
* config/riscv/riscv.md (setmem<mode>): New setmem expansion.
Rainer Orth [Tue, 14 May 2024 14:23:14 +0000 (16:23 +0200)]
testsuite: analyzer: Fix fd-glibc-byte-stream-connection-server.c on Solaris [PR107750]
gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c currently FAILs
on Solaris:
FAIL: gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c (test for
excess errors)
Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/analyzer/fd-glibc-byte-stream-connection-server.c:91:3:
error: implicit declaration of function 'memset'
[-Wimplicit-function-declaration]
Solaris <sys/select.h> has
but no declaration of memset. While one can argue that this should be
fixed, it's easy enough to just include <string.h> instead, which is
what this patch does.
Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.
Tom de Vries [Mon, 13 May 2024 16:10:15 +0000 (18:10 +0200)]
[debug] Fix dwarf v4 .debug_macro.dwo
Consider a hello world, compiled with -gsplit-dwarf and dwarf version 4, and
-g3:
...
$ gcc -gdwarf-4 -gsplit-dwarf /data/vries/hello.c -g3 -save-temps -dA
...
In section .debug_macro.dwo, we have:
...
.Ldebug_macro0:
.value 0x4 # DWARF macro version number
.byte 0x2 # Flags: 32-bit, lineptr present
.long .Lskeleton_debug_line0
.byte 0x3 # Start new file
.uleb128 0 # Included from line number 0
.uleb128 0x1 # file /data/vries/hello.c
.byte 0x5 # Define macro strp
.uleb128 0 # At line number 0
.uleb128 0x1d0 # The macro: "__STDC__ 1"
...
Given that we use a DW_MACRO_define_strp, we'd expect 0x1d0 to be an
offset into a .debug_str.dwo section.
But in fact, 0x1d0 is an index into the string offset table in
section .debug_str_offsets.dwo:
...
.long 0x34f0 # indexed string 0x1d0: __STDC__ 1
...
Add asserts that catch this inconsistency, and fix this by using
DW_MACRO_define_strx instead.
Tested on x86_64.
gcc/ChangeLog:
2024-05-14 Tom de Vries <tdevries@suse.de>
PR debug/115066
* dwarf2out.cc (output_macinfo_op): Fix DW_MACRO_define_strx/strp
choice for v4 .debug_macro.dwo. Add asserts to check that choice.
Jan Hubicka [Tue, 14 May 2024 10:58:56 +0000 (12:58 +0200)]
Reduce recursive inlining of always_inline functions
this patch tames down inliner on (mutiply) self-recursive always_inline functions.
While we already have caps on recursive inlning, the testcase combines early inliner
and late inliner to get very wide recursive inlining tree. The basic idea is to
ignore DISREGARD_INLINE_LIMITS when deciding on inlining self recursive functions
(so we cut on function being large) and clear the flag once it is detected.
I did not include the testcase since it still produces a lot of code and would
slow down testing. It also outputs many inlining failed messages that is not
very nice, but it is hard to detect self recursin cycles in full generality
when indirect calls and other tricks may happen.
gcc/ChangeLog:
PR ipa/113291
* ipa-inline.cc (enum can_inline_edge_by_limits_flags): New enum.
(can_inline_edge_by_limits_p): Take flags instead of multiple bools; add flag
for forcing inlinie limits.
(can_early_inline_edge_p): Update.
(want_inline_self_recursive_call_p): Update; use FORCE_LIMITS mode.
(check_callers): Update.
(update_caller_keys): Update.
(update_callee_keys): Update.
(recursive_inlining): Update.
(add_new_edges_to_heap): Update.
(speculation_useful_p): Update.
(inline_small_functions): Clear DECL_DISREGARD_INLINE_LIMITS on self recursion.
(flatten_function): Update.
(inline_to_all_callers_1): Update.
Haochen Gui [Tue, 14 May 2024 08:37:06 +0000 (16:37 +0800)]
rs6000: Enable overlapped by-pieces operations
This patch enables overlapped by-piece operations by defining
TARGET_OVERLAP_OP_BY_PIECES_P to true. On rs6000, default move/set/clear
ratio is 2. So the overlap is only enabled with compare by-pieces.
Piotr Trojanek [Mon, 19 Feb 2024 08:46:04 +0000 (09:46 +0100)]
ada: Fix classification of SPARK Boolean aspects
The implementation of User_Aspect_Definition uses subtype
Boolean_Aspects to decide which existing aspects can be used to define
old aspects. This subtype didn't include many of the SPARK aspects,
notably the Always_Terminates.
gcc/ada/
* aspects.ads (Aspect_Id, Boolean_Aspect): Change categorization
of Boolean-valued SPARK aspects.
* sem_ch13.adb (Analyze_Aspect_Specification): Adapt CASE
statements to new classification of Boolean-valued SPARK
aspects.
Eric Botcazou [Fri, 16 Feb 2024 09:30:17 +0000 (10:30 +0100)]
ada: Fix small inaccuracy in previous change
The call to Build_Allocate_Deallocate_Proc must occur before the special
accessibility check for class-wide allocation is generated, because this
check comes with cleanup code.
gcc/ada/
* exp_ch4.adb (Expand_Allocator_Expression): Move the first call to
Build_Allocate_Deallocate_Proc up to before the accessibility check.
A recent change broke pragma Warnings when -gnatD is enabled in some
cases. This patch fixes this by caching more slocs at times when it's
known that they haven't been modified by -gnatD.
gcc/ada/
* errout.adb (Validate_Specific_Warnings): Adapt to record
definition change.
* erroutc.adb (Set_Specific_Warning_On, Set_Specific_Warning_Off,
Warning_Specifically_Suppressed): Likewise.
* erroutc.ads: Change record definition.
Eric Botcazou [Thu, 15 Feb 2024 15:02:51 +0000 (16:02 +0100)]
ada: Decouple attachment from dynamic allocation for controlled objects
This decouples the attachment to the appropriate finalization collection of
dynamically allocated objects that need finalization from their allocation.
The current implementation immediately attaches them after allocating them,
which means that they will be finalized even if their initialization does
not complete successfully. The new implementation instead generates the
same sequence as the one generated for (statically) declared objects, that
is to say, allocation, initialization and attachment in this order.
gcc/ada/
* exp_ch3.adb (Build_Default_Initialization): Do not generate the
protection for finalization collections.
(Build_Heap_Or_Pool_Allocator): Set the No_Initialization flag on
the declaration of the temporary.
* exp_ch4.adb (Build_Aggregate_In_Place): Do not build an allocation
procedure here.
(Expand_Allocator_Expression): Build an allocation procedure, if it
is required, only just before rewriting the allocator.
(Expand_N_Allocator): Do not build an allocation procedure if the
No_Initialization flag is set on the allocator, except for those
generated for special return objects. In other cases, build an
allocation procedure, if it is required, only before rewriting
the allocator.
* exp_ch7.ads (Make_Address_For_Finalize): New function declaration.
* exp_ch7.adb (Finalization Management): Update description for
dynamically allocated objects.
(Make_Address_For_Finalize): Remove declaration.
(Find_Last_Init): Change to function and move to...
(Process_Object_Declaration): Adjust to above change.
* exp_util.ads (Build_Allocate_Deallocate_Proc): Add Mark parameter
with Empty default and document it.
(Find_Last_Init): New function declaration.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Add Mark parameter
with Empty default and pass it in recursive call. Deal with type
conversions created for interface types. Adjust call sequence to
Allocate_Any_Controlled by changing Collection to In/Out parameter
and removing Finalize_Address parameter. For a controlled object,
generate a conditional call to Attach_Object_To_Collection for an
allocation and to Detach_Object_From_Collection for a deallocation.
(Find_Last_Init): ...here. Compute the initialization type for an
allocator whose designating type is class wide specifically and also
handle concurrent types.
* rtsfind.ads (RE_Id): Add RE_Attach_Object_To_Collection and
RE_Detach_Object_From_Collection.
(RE_Unit_Table): Add entries for RE_Attach_Object_To_Collection and
RE_Detach_Object_From_Collection.
* libgnat/s-finpri.ads (Finalization_Started): Delete.
(Attach_Node_To_Collection): Likewise.
(Detach_Node_From_Collection): Move to...
(Attach_Object_To_Collection): New procedure declaration.
(Detach_Object_From_Collection): Likewise.
(Finalization_Collection): Remove Atomic for Finalization_Started.
Add pragma Inline for Initialize.
* libgnat/s-finpri.adb: Add clause for Ada.Unchecked_Conversion.
(To_Collection_Node_Ptr): New instance of Ada.Unchecked_Conversion.
(Detach_Node_From_Collection): ...here.
(Attach_Object_To_Collection): New procedure.
(Detach_Object_From_Collection): Likewise.
(Finalization_Started): Delete.
(Finalize): Replace allocation with attachment in comments.
* libgnat/s-stposu.ads (Allocate_Any_Controlled): Rename parameter
Context_Subpool into Named_Subpool, parameter Context_Collection
into Collection and change it to In/Out, and remove Fin_Address.
* libgnat/s-stposu.adb: Remove clause for Ada.Unchecked_Conversion
and Finalization_Primitives.
(To_Collection_Node_Ptr): Delete.
(Allocate_Any_Controlled): Rename parameter Context_Subpool into
Named_Subpool, parameter Context_Collection into Collection and
change it to In/Out, and remove Fin_Address. Do not lock/unlock
and do not attach the object, instead only displace its address.
(Deallocate_Any_Controlled): Do not lock/unlock and do not detach
the object.
(Header_Size_With_Padding): Use qualified name for Header_Size.
Steve Baird [Thu, 15 Feb 2024 00:27:59 +0000 (16:27 -0800)]
ada: Follow up fixes for Put_Image/streaming regressions
A recent change to reduce duplication of compiler-generated Put_Image and
streaming subprograms introduced two regressions. One is yet another of the
many cases where generating these routines "on demand" (as opposed at the
point of the associated type declaration) requires loosening the compiler's
enforcement of privacy. The other is a use-before-definition issue that
occurs because the declaration of a Put_Image procedure is not hoisted far
enough.
gcc/ada/
* exp_attr.adb (Build_And_Insert_Type_Attr_Subp): If a subprogram
associated with a (library-level) type declared in another unit is
to be inserted somewhere in a list, then insert it at the head of
the list.
* sem_ch5.adb (Analyze_Assignment): Normally a limited-type
assignment is illegal. Relax this rule if Comes_From_Source is
False and the type is not immutably limited.
ada: Fix pragma Compile_Time_Error and -gnatdJ crash
This patch makes it so the diagnostics coming from occurrences of
pragma Compile_Time_Error and Compile_Time_Warning are emitted with
a node parameter so they don't cause a crash when -gnatdJ is enabled.
gcc/ada/
* errout.ads (Error_Msg): Add node parameter.
* errout.adb (Error_Msg): Add parameter and pass it to
the underlying call.
* sem_prag.adb (Validate_Compile_Time_Warning_Or_Error): Pass
pragma node when emitting errors.
This patch makes it so -gnatyz style checks reports specify a node
ID. That is required since those checks are sometimes made during
semantic analysis of short-circuit operators, where the Current_Node
mechanism that -gnatdJ uses is not operational.
Check_Xtra_Parens_Precedence is moved from Styleg to Style to make
this possible.
gcc/ada/
* styleg.ads (Check_Xtra_Parens_Precedence): Moved ...
* style.ads (Check_Xtra_Parens_Precedence): ... here. Also
replace corresponding renaming.
* styleg.adb (Check_Xtra_Parens_Precedence): Moved ...
* style.adb (Check_Xtra_Parens_Precedence): here. Also use
Errout.Error_Msg and pass it a node parameter.
Eric Botcazou [Wed, 14 Feb 2024 00:22:49 +0000 (01:22 +0100)]
ada: Small cleanup about allocators and aggregates
This eliminates a few oddities present in the expander for allocators and
aggregates present in allocators:
- Convert_Array_Aggr_In_Allocator takes both a Decl and Alloc parameters,
and inserts new code before Alloc for records and after Decl for arrays
through Convert_Array_Aggr_In_Allocator. Now, for the 3 (duplicated)
calls to the procedure, that's the same place. It also creates a new
list that it does not use in most cases.
- Expand_Allocator_Expression uses the same code sequence in 3 places
when the expression is an aggregate to build in place.
- Build_Allocate_Deallocate_Proc takes an Is_Allocate parameter that is
entirely determined by the N parameter: if N is an allocator, it must
be true; if N is a free statement, it must be false. Barring that,
the procedure either raises an assertion or Program_Error. It also
contains useless pattern matching code in the second part.
No functional changes.
gcc/ada/
* exp_aggr.ads (Convert_Aggr_In_Allocator): Rename Alloc into N,
replace Decl with Temp and adjust description.
(Convert_Aggr_In_Object_Decl): Alphabetize.
(Is_Delayed_Aggregate): Likewise.
* exp_aggr.adb (Convert_Aggr_In_Allocator): Rename Alloc into N
and replace Decl with Temp. Allocate a list only when neeeded.
(Convert_Array_Aggr_In_Allocator): Replace N with Decl and insert
new code before it.
* exp_ch4.adb (Build_Aggregate_In_Place): New procedure nested in
Expand_Allocator_Expression.
(Expand_Allocator_Expression): Call it to build aggregates in place.
Remove second parameter in calls to Build_Allocate_Deallocate_Proc.
(Expand_N_Allocator): Likewise.
* exp_ch13.adb (Expand_N_Free_Statement): Likewise.
* exp_util.ads (Build_Allocate_Deallocate_Proc): Remove Is_Allocate
parameter.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Remove Is_Allocate
parameter and replace it with local variable of same name. Delete
useless pattern matching.
Before this patch, the default status of -gnatw.i and -gnatw.d are
reported incorrectly in the usage string used throughout GNAT tools.
This patch fixes this.
The parameters should be swapped to fit Fileapi.h documentation.
BOOL LocalFileTimeToFileTime(
[in] const FILETIME *lpLocalFileTime,
[out] LPFILETIME lpFileTime
);