trying to combine definition of r11 in:
40: a1:SI=frm:SI
into:
42: frm:SI=a1:SI
instruction becomes a no-op:
(set (reg:SI 69 frm)
(reg:SI 69 frm))
original cost = 4 + 4 (weighted: 8.000000), replacement cost = 2147483647; keeping replacement
rescanning insn with uid = 42.
updating insn 42 in-place
verify found no changes in insn with uid = 42.
deleting insn 40
For example we have code as blow:
9 │ int test_exampe () {
10 │ test ();
11 │
12 │ size_t vl = 4;
13 │ vfloat16m1_t va = __riscv_vle16_v_f16m1(a, vl);
14 │ va = __riscv_vfnmadd_vv_f16m1_rm(va, va, va, __RISCV_FRM_RDN, vl);
15 │ va = __riscv_vfmsac_vv_f16m1(va, va, va, vl);
16 │
17 │ __riscv_vse16_v_f16m1(b, va, vl);
18 │
19 │ return 0;
20 │ }
it will be compiled to:
53 │ main:
54 │ addi sp,sp,-16
55 │ sd ra,8(sp)
56 │ call initialize
57 │ lui a6,%hi(b)
58 │ lui a2,%hi(a)
59 │ addi a3,a6,%lo(b)
60 │ addi a2,a2,%lo(a)
61 │ li a4,4
62 │ .L8:
63 │ fsrmi 2
64 │ vsetvli a5,a4,e16,m1,ta,ma
65 │ vle16.v v1,0(a2)
66 │ slli a1,a5,1
67 │ subw a4,a4,a5
68 │ add a2,a2,a1
69 │ vfnmadd.vv v1,v1,v1
>> The fsrm a0 insn is deleted by late-combine <<
70 │ vfmsub.vv v1,v1,v1
71 │ vse16.v v1,0(a3)
72 │ add a3,a3,a1
73 │ bgt a4,zero,.L8
74 │ lh a4,%lo(b)(a6)
75 │ li a5,-20480
76 │ addi a5,a5,-1382
77 │ bne a4,a5,.L14
78 │ ld ra,8(sp)
79 │ li a0,0
80 │ addi sp,sp,16
81 │ jr ra
This patch would like to add the FRM register to the global_regs as it
is a cooperatively-managed global register. And then the fsrm insn will
not be eliminated by late-combine. The related spec17 cam4 failure may
also caused by this issue too.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118103
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the FRM as the global_regs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118103-1.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-1.c: New test.
Gaius Mulley [Sat, 25 Jan 2025 20:30:02 +0000 (20:30 +0000)]
[PR modula2/118010, modula2/118183] Rebuild bootstrap tools with lseek fix
This patch rebuilds the bootstrap tools mc and pge incorporating the fix to
libc.lseek. The tool mc is changed to omit INCLUDE_MEMORY from
checkGccConfigSystem. The pge tool on rebuild now requires
--gcc-config-system to pick up the system.h containing INCLUDE_MEMORY.
After rebuild all local INCLUDE_MEMORY definitions disappear.
Harald Anlauf [Sat, 25 Jan 2025 18:59:56 +0000 (19:59 +0100)]
Fortran: fix issues with variables in BLOCK DATA [PR58857]
PR fortran/58857
gcc/fortran/ChangeLog:
* class.cc (gfc_find_derived_vtab): Declare some frontend generated
variables and procedures (_vtab, _copy, _deallocate) as artificial.
(find_intrinsic_vtab): Likewise.
* trans-decl.cc (check_block_data_decls): New helper function.
(gfc_generate_block_data): Use it to emit warnings for variables
declared in a BLOCK DATA program unit but not in a COMMON block.
gcc/testsuite/ChangeLog:
* gfortran.dg/uncommon_block_data_2.f90: New test.
Andi Kleen [Thu, 26 Dec 2024 04:21:58 +0000 (20:21 -0800)]
Move ferror out of hot loop of file cache
glibc ferror is surprisingly expensive. Move it out of the hot loop
of finding lines by setting a flag after the actual IO operations.
gcc/ChangeLog:
PR preprocessor/118168
* input.cc (file_cache_slot::m_error): New field.
(file_cache_slot::create): Clear m_error.
(file_cache_slot::file_cache_slot): Clear m_error.
(file_cache_slot::read_data): Set m_error on error.
(file_cache_slot::get_next_line): Use m_error instead of ferror.
This patch fixes calls to lseek from m2 sources. The new data
type SYSTEM.COFF_T is used instead of SYSTEM.CCSIZE_T.
gcc/m2/ChangeLog:
PR modula2/118010
* gm2-libs-log/FileSystem.mod (doModeChange): Replace
LONGINT with COFF_T.
(SetPos): Use COFF_T for the return value and offset type
when calling lseek.
* gm2-libs/FIO.mod (SetPositionFromBeginning): Convert pos
to COFF_T.
(SetPositionFromEnd): Ditto.
* mc-boot/GFIO.cc: Rebuild.
* mc-boot/Glibc.h: Ditto.
* pge-boot/GFIO.cc: Ditto.
* pge-boot/Glibc.h: Ditto.
Simon Martin [Sat, 25 Jan 2025 17:09:23 +0000 (18:09 +0100)]
c++: Reinstate check for uninitialized bases with c++ <= 17 [PR118239]
We currently accept this code with c++ <= 17 even though it's invalid
since the base is not initialized (we properly reject it with c++ >= 20)
=== cut here ===
struct NoMut1 { int a, b; };
struct NoMut3 : NoMut1 {
constexpr NoMut3(int a, int b) {}
};
void mutable_subobjects() {
constexpr NoMut3 nm3(1, 2);
}
=== cut here ===
This is a fallout of r0-118700-gc2b3ec18a494e3, that ignores all fields
with DECL_ARTIFICIAL in cx_check_missing_mem_inits, including those that
represent base classes, and need to be checked.
This patch makes sure that we only skip fields that have DECL_ARTIFICIAL
if they don't have DECL_FIELD_IS_BASE.
PR c++/118239
gcc/cp/ChangeLog:
* constexpr.cc (cx_check_missing_mem_inits): Don't skip fields
with DECL_FIELD_IS_BASE.
Jeff Law [Sat, 25 Jan 2025 16:42:19 +0000 (09:42 -0700)]
[RISC-V][PR target/116256] Improve handling of single bit constants
So under the umbrella of pr116256 (P3 regression) I've been exploring removal
of the mvconst_internal pattern. Not surprisingly, that's going to cause all
kinds of undesirable fallout. While I can kind of see a path forward for that
work, it's going to require some combine work that I don't think we want to
tackle in the context of gcc-15.
Essentially without mvconst_internal we'll have fully exposed constant
synthesis prior to combine. Remember that combine has limits on what
combinations it will perform based on how many instructions are in the source
sequence. If we need 2+ instructions to synthesize the constant, those eat
into our budget.
In a world without mvconst_internal we'd need to either improve combine to
handle 5 insns cases (which do show up in the testsuite) or we need to
significantly improve how combine handles REG_EQUAL notes. 5 insn combinations
sound like insanity to me. So I'd tend to lean towards the latter, though
that's going to need some refactoring and diving into note redistribution
(ugh!).
In the mean time we can start limiting mvconst_internal. For the remaining
case in pr116256 we have this code in combine:
Note a couple things. First insn 8 will be split shortly after combine and
will need the constant 2048. But that's obviously exposed late. Second (of
course) is the mvconst_internal pattern at insn 10. After split1 we'll have:
> (insn 16 5 17 2 (set (reg:DI 144) (const_int 4096 [0x1000])) "j.c":152:11 -1
> (nil))
> (insn 17 16 18 2 (set (reg:DI 143)
> (plus:DI (reg:DI 144)
> (const_int -2048 [0xfffffffffffff800]))) "j.c":152:11 -1
> (expr_list:REG_EQUAL (const_int 2048 [0x800])
> (nil)))
> (insn 18 17 19 2 (set (reg:V2048HF 138 [ _5 ])
> (if_then_else:V2048HF (unspec:V2048BI [ (const_vector:V2048BI [
> (const_int 1 [0x1]) repeated x2048
> ])
> (reg:DI 143)
> (const_int 2 [0x2]) repeated x3
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)
> (vec_duplicate:V2048HF (reg:HF 142 [ x ]))
> (unspec:V2048HF [ (reg:DI 0 zero)
> ] UNSPEC_VUNDEF))) "j.c":152:11 -1
> (nil))
> (insn 19 18 20 2 (set (reg:DI 145)
> (const_int 4096 [0x1000])) "j.c":152:11 -1
> (nil))
> (insn 20 19 11 2 (set (reg:DI 139)
> (plus:DI (reg:DI 145)
> (const_int -2048 [0xfffffffffffff800]))) "j.c":152:11 -1
> (expr_list:REG_EQUAL (const_int 2048 [0x800])
> (nil)))
> (insn 11 20 0 2 (set (mem:V2048HF (reg/f:DI 141 [ in ]) [1 MEM <vector(2048) _Float16> [(_Float16 *)in_7(D)]+0 S4096 A128])
> (if_then_else:V2048HF (unspec:V2048BI [
> (const_vector:V2048BI [
> (const_int 1 [0x1]) repeated x2048
> ])
> (reg:DI 139) (const_int 2 [0x2]) repeated x3
> (reg:SI 66 vl)
> (reg:SI 67 vtype)
> ] UNSPEC_VPREDICATE)
> (reg:V2048HF 138 [ _5 ])
> (unspec:V2048HF [ (reg:DI 0 zero)
> ] UNSPEC_VUNDEF))) "j.c":152:11 3843 {*pred_movv2048hf}
> (expr_list:REG_DEAD (reg/f:DI 141 [ in ])
> (expr_list:REG_DEAD (reg:DI 0 zero) (expr_list:REG_DEAD (reg:SI 66 vl)
> (expr_list:REG_DEAD (reg:SI 67 vtype)
> (expr_list:REG_DEAD (reg:V2048HF 138 [ _5 ])
> (expr_list:REG_DEAD (reg:DI 139)
> (nil))))))))
Note the synthesis of 2048 appears twice. I seriously considered adding a
local cprop pass at this point. That could be done with a bit of work. It
didn't look too bad -- the biggest problem is cprop isn't designed to run once
we've left cfglayout. But we could probably finesse that by not allowing it to
change jumps if we've left cfglayout or converting it to do the more complex
jump fixups.
You might ask why the post-reload optimizers don't help since this at least
looks like a case where they could. After LRA the RTL looks like:
Note the re-use of a5 for the constant synthesis steps. That's going to spoil
any chance of reload_cse saving us. That re-use also gets in the way of vsetvl
elimination and we ultimately get this code:
> foo10:
> li a5,4096
> addi a5,a5,-2048
> vsetvli zero,a5,e16,m8,ta,ma
> vfmv.v.f v8,fa0
> li a5,4096
> addi a5,a5,-2048
> vsetvli zero,a5,e16,m8,ta,ma
> vse16.v v8,0(a0)
> ret
The regression is we have the obviously redundant vsetvl. The additional copy
of the synthesis is undesirable as well.
If we filter out single bit constants from mvconst_internal we trivially fix
that regression. The only fallout is a class of saturation tests which want to
test against 0x80000000. Under the hood this is a minor codegen issue
interacting badly with combine's deliberate rejection of simplification of
extensions of constants. Rather than constructing the SImode constant, then
zero extending the result we can just generate the constant we actually want
directly in DImode.
The net is we fix the regression, don't introduce any obvious new regressions
and slightly reduce our dependence on mvconst_internal. All good in my book.
Obviously I'll wait for pre-commit CI to render a verdict.
PR target/116256
gcc/
* config/riscv/riscv.md (mvconst_internal): Reject single bit
constants.
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Improve
handling constants.
Jakub Jelinek [Sat, 25 Jan 2025 09:15:24 +0000 (10:15 +0100)]
c++: Only destruct elts of array for new expression if exception is thrown during the initialization [PR117827]
The following testcase r12-6328, because the elements of the array
are destructed twice, once when the callee encounters delete[] p;
and then second time when the exception is thrown.
The array elts should be only destructed if exception is thrown from
one of the constructors during the build_vec_init emitted code in case of
new expressions, but when the new expression completes, it is IMO
responsibility of user code to delete[] it when it is no longer needed.
So, the following patch uses the cleanup_flags argument to build_vec_init
to get notified of the flags that need to be changed when the expression
is complete and build_disable_temp_cleanup to do the changes.
2025-01-25 Jakub Jelinek <jakub@redhat.com>
PR c++/117827
* init.cc (build_new_1): Pass address of a make_tree_vector ()
initialized gc tree vector to build_vec_init and append
build_disable_temp_cleanup to init_expr from it.
This patch corrects a typo in the definition of lseek in libc.
The second offset parameter should have been declared as COFF_T.
No errors are seen when bootstrapping using -Werror=odr
-Werror=lto-type-mismatch.
gcc/m2/ChangeLog:
PR modula2/118010
* gm2-compiler/P2SymBuild.mod (Debug): Comment out unused
procedure.
* gm2-libs/libc.def (lseek): Declare second parameter offset
as COFF_T.
Nathaniel Shead [Sun, 5 Jan 2025 12:01:44 +0000 (23:01 +1100)]
c++/modules: Treat unattached lambdas as TU-local [PR116568]
This fixes ICEs where unattached lambdas at class scope (for instance,
in member template instantiations) are streamed. This is only possible
in header units, as in named modules attempting to stream such lambdas
will be an error.
PR c++/116568
gcc/cp/ChangeLog:
* module.cc (trees_out::get_merge_kind): Treat all lambdas
without a mangling scope as un-mergeable.
gcc/testsuite/ChangeLog:
* g++.dg/modules/lambda-8.h: New test.
* g++.dg/modules/lambda-8_a.H: New test.
* g++.dg/modules/lambda-8_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Sun, 5 Jan 2025 12:45:05 +0000 (23:45 +1100)]
c++/modules: Diagnose TU-local lambdas, give mangling scope to lambdas in concepts
This fills in a hole left in r15-6378-g9016c5ac94c557 with regards to
detection of TU-local lambdas. Now that LAMBDA_EXPR_EXTRA_SCOPE is
properly set for most lambdas we can use it to detect lambdas that are
TU-local.
CWG2988 suggests that lambdas in concept definitions should not be
considered TU-local, since they are always unevaluated and should never
be emitted. This patch gives these lambdas a mangling scope (though it
will never be actually used in name mangling).
PR c++/116568
gcc/cp/ChangeLog:
* cp-tree.h (finish_concept_definition): Adjust parameters.
(start_concept_definition): Declare.
* module.cc (depset::hash::is_tu_local_entity): Use
LAMBDA_EXPR_EXTRA_SCOPE to detect TU-local lambdas.
* parser.cc (cp_parser_concept_definition): Start a lambda scope
for concept definitions.
* pt.cc (tsubst_lambda_expr): Namespace-scope lambdas may now
have extra scope.
(finish_concept_definition): Split into...
(start_concept_definition): ...this new function.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-4_b.C: Remove XFAIL, add lambda alias
testcase.
* g++.dg/modules/lambda-9.h: New test.
* g++.dg/modules/lambda-9_a.H: New test.
* g++.dg/modules/lambda-9_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Thu, 23 Jan 2025 08:22:04 +0000 (19:22 +1100)]
c++: Fix mangling of otherwise unattached class-scope lambdas [PR118245]
This is a step closer to implementing the suggested changes for
https://github.com/itanium-cxx-abi/cxx-abi/pull/85. Most lambdas
defined within a class should have an extra scope of that class so that
uses across different TUs are properly merged by the linker. This also
needs to happen during template instantiation.
While I was working on this I found some other cases where the mangling
of lambdas was incorrect and causing issues, notably the testcase
lambda-ctx3.C which currently emits the same mangling for the base class
and member lambdas, causing mysterious assembler errors since r14-9232.
One notable case not handled either here or in the ABI is what is
supposed to happen with such unattached lambdas declared in member
templates; see lambda-uneval22. I believe that by the C++ standard,
such lambdas should also dedup across TUs, but this isn't currently
implemented, and it's not clear exactly how such lambdas should mangle.
Since this should only affect usage of lambdas in unevaluated contexts
(a C++20 feature) this patch does not add an ABI flag to control this
behaviour.
PR c++/118245
gcc/cp/ChangeLog:
* cp-tree.h (LAMBDA_EXPR_EXTRA_SCOPE): Adjust comment.
* parser.cc (cp_parser_class_head): Start (and do not finish)
lambda scope for all valid types.
(cp_parser_class_specifier): Finish lambda scope after parsing
members instead.
* pt.cc (instantiate_class_template): Add lambda scoping.
gcc/testsuite/ChangeLog:
* g++.dg/abi/lambda-ctx3.C: New test.
* g++.dg/cpp2a/lambda-uneval22.C: New test.
* g++.dg/cpp2a/lambda-uneval23.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Gaius Mulley [Sat, 25 Jan 2025 00:05:48 +0000 (00:05 +0000)]
PR modula2/118589 Opaque type fields are visible outside implementation module
This patch fixes a bug shown when a variable declared as an opaque type is
dereferenced outside the declaration module. The fix also improves error
recovery. In the error cases it ensures that an error symbol is created
and the appropriate virtual token is assigned. Finally there is a new
testsuite directory gm2.dg which contains tests to check against expected
error messages.
gcc/m2/ChangeLog:
PR modula2/118589
* gm2-compiler/M2MetaError.mod (symDesc): Add opaque type
description.
* gm2-compiler/M2Quads.mod (BuildDesignatorPointerError): New
procedure.
(BuildDesignatorPointer): Reimplement.
* gm2-compiler/P3Build.bnf (SubDesignator): Tidy up error message.
Use MetaErrorT2 rather than WriteForma1 and use the token pos from
the quad stack.
gcc/testsuite/ChangeLog:
PR modula2/118589
* lib/gm2-dg.exp (gm2.exp): load_lib.
* gm2.dg/pim/fail/badopaque.mod: New test.
* gm2.dg/pim/fail/badopaque2.mod: New test.
* gm2.dg/pim/fail/dg-pim-fail.exp: New test.
* gm2.dg/pim/fail/opaquedefs.def: New test.
* gm2.dg/pim/fail/opaquedefs.mod: New test.
Andrew Carlotti [Fri, 24 Jan 2025 11:00:41 +0000 (11:00 +0000)]
aarch64: Add +cpa feature flag
This doesn't enable anything within the compiler, but this allows the
flag to be passed the assembler. There also doesn't appear to be a
kernel cpuinfo name yet.
Andrew Carlotti [Thu, 9 Jan 2025 19:33:25 +0000 (19:33 +0000)]
aarch64: Make AARCH64_FL_CRYPTO always unset
This feature flag bit only exists to support the +crypto alias. Outside
of option processing this bit needs to be set or unset consistently.
This patch goes with the latter option.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc: Assert that CRYPTO
bit is not set.
* config/aarch64/aarch64-feature-deps.h
(info<FEAT>.explicit_on): Unset CRYPTO bit.
(cpu_##CORE_IDENT): Ditto.
Andrew Carlotti [Mon, 11 Nov 2024 12:20:25 +0000 (12:20 +0000)]
aarch64: Rewrite architecture strings for assembler
Add infrastructure to allow rewriting the architecture strings passed to
the assembler (either as -march options or .arch directives). There was
already canonicalisation everywhere except for an -march driver option
passed directly to the compiler; this patch applies the same
canonicalisation there as well.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(aarch64_get_arch_string_for_assembler): New.
(aarch64_rewrite_march): New.
(aarch64_rewrite_selected_cpu): Call new function.
* config/aarch64/aarch64-elf.h (ASM_SPEC): Remove identity mapping.
* config/aarch64/aarch64-protos.h
(aarch64_get_arch_string_for_assembler): New.
* config/aarch64/aarch64.cc
(aarch64_declare_function_name): Call new function.
(aarch64_start_file): Ditto.
* config/aarch64/aarch64.h
(EXTRA_SPEC_FUNCTIONS): Use new macro name.
(MCPU_TO_MARCH_SPEC): Rename to...
(MARCH_REWRITE_SPEC): ...this, and extend the spec rule.
(aarch64_rewrite_march): New declaration.
(MCPU_TO_MARCH_SPEC_FUNCTIONS): Rename to...
(AARCH64_BASE_SPEC_FUNCTIONS): ...this, and add new function.
(ASM_CPU_SPEC): Use new macro name.
Andrew Carlotti [Thu, 23 Jan 2025 17:08:17 +0000 (17:08 +0000)]
aarch64: Move arch/cpu parsing to aarch64-common.cc
Aside from moving the functions, the only changes are to make them
non-static, and to use the existing info arrays within aarch64-common.cc
instead of the info arrays remaining in aarch64.cc.
It seems odd that we add "native" to the list for -march but not for
-mcpu. This is probably a bug, but for now we'll preserve the existing
behaviour.
gcc/ChangeLog:
* config/aarch64/aarch64.cc
(aarch64_print_hint_candidates): New helper function.
(aarch64_print_hint_for_core_or_arch): Inline into callers.
(aarch64_print_hint_for_core): Inline callee and use new helper.
(aarch64_print_hint_for_arch): Ditto.
(aarch64_print_hint_for_extensions): Use new helper.
Andrew Carlotti [Wed, 8 Jan 2025 22:58:05 +0000 (22:58 +0000)]
aarch64: Rename info structs in aarch64-common.cc
Also add a (currently unused) processor field to aarch64_processor_info,
and change name from "" to NULL for the terminating array entries.
gcc/ChangeLog:
* common/config/aarch64/aarch64-common.cc
(struct aarch64_option_extension): Rename to..
(struct aarch64_extension_info): ...this.
(all_extensions): Update type name.
(struct arch_to_arch_name): Rename to...
(struct aarch64_arch_info): ...this, and rename name field.
(all_architectures): Update type names, and move before...
(struct processor_name_to_arch): ...this. Rename to...
(struct aarch64_processor_info): ...this, rename name field and
add cpu field.
(all_cores): Update type name, and set new field.
(aarch64_parse_extension): Update names.
(aarch64_get_all_extension_candidates): Ditto.
(aarch64_rewrite_selected_cpu): Ditto.
Andrew Carlotti [Wed, 8 Jan 2025 20:06:09 +0000 (20:06 +0000)]
aarch64: Replace duplicate cpu enums
Replace `enum aarch64_processor` and `enum target_cpus` with
`enum aarch64_cpu`, and prefix the entries with `AARCH64_CPU_`.
Also rename aarch64_none to aarch64_no_cpu.
gcc/ChangeLog:
* config/aarch64/aarch64-opts.h
(enum aarch64_processor): Rename to...
(enum aarch64_cpu): ...this, and rename the entries.
* config/aarch64/aarch64.cc
(aarch64_type): Rename type and initial value.
(struct processor): Rename member types.
(all_architectures): Rename enum members.
(all_cores): Ditto.
(aarch64_get_tune_cpu): Rename type and enum member.
* config/aarch64/aarch64.h (enum target_cpus): Remove.
(TARGET_CPU_DEFAULT): Rename default value.
(aarch64_tune): Rename type.
* config/aarch64/aarch64.opt:
(selected_tune): Rename type and default value.
Andrew Carlotti [Wed, 8 Jan 2025 18:29:27 +0000 (18:29 +0000)]
aarch64: Improve mcpu/march conflict check
Features from a cpu or base architecture that were explicitly disabled
by a +nofeat option were being incorrectly added back in before checking
for conflicts between -mcpu and -march options. This patch instead
compares the returned feature masks directly.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_override_options): Compare
returned feature masks directly.
[PR118497][IRA]: Fix calculation of cost of assigning callee-saved hard reg
Assembler code generated by GCC for PR118497 contains unnecessary
move insn. This happened as IRA assigns AX reg to a pseudo which
should be in BX reg later for a call. The pseudo did not get BX as
LRA decided that it requires to save BX although BX will be saved
anyway. The patch fixes the cost calculation. Usage of hard reg
nrefs from regstat or DF will result in numerous failures as such
nrefs include artificial reg refs. Therefore we add a calculation of
hard reg nrefs in IRA. Also we change regexp used for scanning the
assembler in test vartrack-1.c as with the patch LRA assigns
callee-saved hard reg BP instead of another callee-saved hard reg BX
expected by the test.
gcc/ChangeLog:
PR target/118497
* ira-int.h (target_ira_int): Add x_ira_hard_regno_nrefs.
(ira_hard_regno_nrefs): New macro.
* ira.cc (setup_hard_regno_aclass): Remove unused code. Modify
the comment.
(setup_hard_regno_nrefs): New function.
(ira): Call it.
* ira-color.cc (calculate_saved_nregs): Check
ira_hard_regno_nrefs.
gcc/testsuite/ChangeLog:
PR target/118497
* gcc.target/i386/pr118497.c: New.
* gcc.target/i386/vartrack-1.c: Modify the regexp.
Marek Polacek [Mon, 25 Nov 2024 14:45:13 +0000 (09:45 -0500)]
c++: ICE with nested anonymous union [PR117153]
In a template, for
union {
union {
T d;
};
};
build_anon_union_vars crates a malformed COMPONENT_REF: we have no
DECL_NAME for the nested anon union so we create something like "object.".
Most of the front end doesn't seem to care, but if such a tree gets to
potential_constant_expression, it can cause a crash.
We can use FIELD directly for the COMPONENT_REF's member. tsubst_stmt
should build up a proper one in:
if (VAR_P (decl) && !DECL_NAME (decl)
&& ANON_AGGR_TYPE_P (TREE_TYPE (decl)))
/* Anonymous aggregates are a special case. */
finish_anon_union (decl);
PR c++/117153
gcc/cp/ChangeLog:
* decl2.cc (build_anon_union_vars): Use FIELD for the second operand
of a COMPONENT_REF.
gcc/testsuite/ChangeLog:
* g++.dg/other/anon-union6.C: New test.
* g++.dg/other/anon-union7.C: New test.
yxj-github-437 [Thu, 16 Jan 2025 00:36:15 +0000 (08:36 +0800)]
c++/modules: Fix linkage checks for exported using-decls
This patch attempts to fix an error when build module std. The reason for
the error is __builtin_va_list (aka struct __va_list) has internal linkage.
so mark this builtin type as TREE_PUBLIC to make struct __va_list has
external linkage.
g++ -fmodules -std=c++23 -fsearch-include-path bits/std.cc -c
std.cc:3642:14:error: exporting ‘typedef __gnuc_va_list va_list’ that does not have external linkage
3642 | using std::va_list;
| ^~~~~~~
<built-in>: note: ‘struct __va_list’ declared here with internal linkage
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_build_builtin_va_list): Mark
__builtin_va_list as TREE_PUBLIC.
* config/arm/arm.cc (arm_build_builtin_va_list): Likewise.
David Malcolm [Fri, 24 Jan 2025 15:20:16 +0000 (10:20 -0500)]
jit: fix for write_reproducer [PR117886]
The original generated .c reproducer for PR jit/117886 did not compile,
with:
main.c: In function ‘create_code’:
main.c:600:9: error: initialization of ‘gcc_jit_rvalue *’ from incompatible pointer type ‘gcc_jit_lvalue *’ [-Wincompatible-pointer-types]
600 | local__1,
| ^~~~~~~~
The issue is that recording::ctor::write_reproducer was missing
creation of casts to gcc_jit_rvalue * for
gcc_jit_context_new_array_constructor and
gcc_jit_context_new_struct_constructor.
Fixed thusly.
gcc/jit/ChangeLog:
PR jit/117886
* jit-recording.cc (reproducer::get_identifier_as_rvalue): Handle
null memento.
(reproducer::get_identifier_as_lvalue): Likewise.
(reproducer::get_identifier_as_type): Likewise.
(recording::ctor::write_reproducer): Use get_identifier_as_rvalue
rather than get_identifier when writing out gcc_jit_rvalue *
expressions.
David Malcolm [Fri, 24 Jan 2025 15:20:11 +0000 (10:20 -0500)]
sarif-replay: respect prefix and suffix during installation [PR117670]
gcc/ChangeLog:
PR sarif-replay/117670
* Makefile.in (SARIF_REPLAY_INSTALL_NAME): New.
(install-libgdiagnostics): Use it,and exeext, rather than just
sarif-replay.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
r15-491-gc290e6a0b7a9de fixed a latent issue with dr_analyze_innermost
and dr_may_alias where not properly analyzed DRs would yield an invalid
answer. This caused some missed optimizations in case there is not
actually any evolution in the not analyzed base part. The following
recovers this by only handling base parts which reference SSA vars
as index in the conservative way.
The gfortran.dg/vect/vect-8.f90 testcase is difficult to deal with,
so the following merely bumps the maximum number of expected vectorized loops
for both aarch64 and x86-64.
PR tree-optimization/116010
* tree-data-ref.cc (contains_ssa_ref_p_1): New function.
(contains_ssa_ref_p): Likewise.
(dr_may_alias_p): Avoid treating unanalyzed base parts without
SSA reference conservatively.
Merge new optabs with the existing implementations for signbit and
isinf.
gcc/ChangeLog:
* config/s390/s390.h (S390_TDC_POSITIVE_ZERO): Remove.
(S390_TDC_NEGATIVE_ZERO): Remove.
(S390_TDC_POSITIVE_NORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_NORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_POSITIVE_DENORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_DENORMALIZED_BFP_NUMBER): Remove.
(S390_TDC_POSITIVE_INFINITY): Remove.
(S390_TDC_NEGATIVE_INFINITY): Remove.
(S390_TDC_POSITIVE_QUIET_NAN): Remove.
(S390_TDC_NEGATIVE_QUIET_NAN): Remove.
(S390_TDC_POSITIVE_SIGNALING_NAN): Remove.
(S390_TDC_NEGATIVE_SIGNALING_NAN): Remove.
(S390_TDC_POSITIVE_DENORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_DENORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_POSITIVE_NORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_NEGATIVE_NORMALIZED_DFP_NUMBER): Remove.
(S390_TDC_SIGNBIT_SET): Remove.
(S390_TDC_INFINITY): Remove.
* config/s390/s390.md (signbit<mode>2<tf_fpr>): Merge this one
(isinf<mode>2<tf_fpr>): and this one into
(<TDC_CLASS:tdc_insn><mode>2<tf_fpr>): new expander.
(isnormal<mode>2<tf_fpr>): New BFP expander.
(isnormal<mode>2): New DFP expander.
* config/s390/vector.md (signbittf2_vr): Merge this one
(isinftf2_vr): and this one into
(<tdc_insn>tf2_vr): new expander.
(signbittf2): Merge this one
(isinftf2): and this one into
(<tdc_insn>tf2): new expander.
gcc/testsuite/ChangeLog:
* gcc.target/s390/isfinite-isinf-isnormal-signbit-1.c: New test.
* gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: New test.
* gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: New test.
* gcc.target/s390/isfinite-isinf-isnormal-signbit.h: New test.
Richard Biener [Fri, 24 Jan 2025 08:13:17 +0000 (09:13 +0100)]
tree-optimization/118634 - improve cunroll dump
We no longer subtract the estimated eliminated number of instructions
from the estimated size after unrolling we print - this is a bit
confusing when comparing dumps to previous releases. The following
changes the dump from
Estimated size after unrolling: 42
to
Estimated size after unrolling: 42-12
for the testcase in the PR.
PR tree-optimization/118634
* tree-ssa-loop-ivcanon.cc (try_unroll_loop_completely):
Dump the number of estimated eliminated insns.
Saurabh Jha [Tue, 21 Jan 2025 15:59:39 +0000 (15:59 +0000)]
Fix command flags for SVE2 faminmax
Earlier, we were gating SVE2 faminmax behind sve+faminmax. This was
incorrect and this patch changes it so that it is gated behind
sve2+faminmax.
gcc/ChangeLog:
* config/aarch64/aarch64-sve2.md:
(*aarch64_pred_faminmax_fused): Fix to use the correct flags.
* config/aarch64/aarch64.h
(TARGET_SVE_FAMINMAX): Remove.
* config/aarch64/iterators.md: Fix iterators so that famax and
famin use correct flags.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/faminmax_1.c: Fix test to use the
correct flags.
* gcc.target/aarch64/sve/faminmax_2.c: Fix test to use the
correct flags.
* gcc.target/aarch64/sve/faminmax_3.c: New test.
Alexandre Oliva [Fri, 24 Jan 2025 02:23:16 +0000 (23:23 -0300)]
[ifcombine] check for more zero-extension cases [PR118572]
When comparing a signed narrow variable with a wider constant that has
the bit corresponding to the variable's sign bit set, we would check
that the constant is a sign-extension from that sign bit, and conclude
that the compare fails if it isn't.
When the signed variable is masked without getting the [lr]l_signbit
variable set, or when the sign bit itself is masked out, we know the
sign-extension bits from the extended variable are going to be zero,
so the constant will only compare equal if it is a zero- rather than
sign-extension from the narrow variable's precision, therefore, check
that it satisfies this property, and yield a false compare result
otherwise.
for gcc/ChangeLog
PR tree-optimization/118572
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Compare as
unsigned the variables whose extension bits are masked out.
Alexandre Oliva [Fri, 24 Jan 2025 02:23:13 +0000 (23:23 -0300)]
[ifcombine] improve reverse checking and operand swapping
Don't reject an ifcombine field-merging opportunity just because the
left-hand operands aren't both reversed, if the second compare needs
to be swapped for operands to match.
Also mention that reversep does NOT affect the turning of range tests
into bit tests.
for gcc/ChangeLog
* gimple-fold.cc (fold_truth_andor_for_ifcombine): Document
reversep's absence of effects on range tests. Don't reject
reversep mismatches before trying compare swapping.
Alexandre Oliva [Fri, 24 Jan 2025 02:23:10 +0000 (23:23 -0300)]
[ifcombine] out-of-bounds bitfield refs can trap [PR118514]
Check that BIT_FIELD_REFs of DECLs are in range before deciding they
don't trap.
Check that a replacement bitfield load is as trapping as the replaced
load.
for gcc/ChangeLog
PR tree-optimization/118514
* tree-eh.cc (bit_field_ref_in_bounds_p): New.
(tree_could_trap_p) <BIT_FIELD_REF>: Call it.
* gimple-fold.cc (make_bit_field_load): Check trapping status
of replacement load against original load.
Marek Polacek [Fri, 15 Nov 2024 04:47:46 +0000 (23:47 -0500)]
c++: bogus error with nested lambdas [PR117602]
The error here should also check that we aren't nested in another
lambda; in it, at_function_scope_p() will be false.
PR c++/117602
gcc/cp/ChangeLog:
* cp-tree.h (current_nonlambda_scope): Add a default argument.
* lambda.cc (current_nonlambda_scope): New bool parameter. Use it.
* parser.cc (cp_parser_lambda_introducer): Use current_nonlambda_scope
to check if the lambda is non-local.
Jakub Jelinek [Thu, 23 Jan 2025 22:06:07 +0000 (23:06 +0100)]
c++: Small make_tree_vector_from_ctor improvement
After committing the append_ctor_to_tree_vector patch, I've realized
that for the larger constructors make_tree_vector_from_ctor unnecessarily
wastes one GC vector; make_tree_vector () / release_tree_vector () only
caches GC vectors from 4 to 16 allocated tree elements, so in the likely
case of a rather small ctor using make_tree_vector () can be beneficial,
we can pick something from the cache and if we don't need it later,
pt.cc calls release_tree_vector on it to return it back to the cache.
But for the larger ctors, we just eat one vector from the cache, never
use it (because the vec_safe_reserve will immediately allocate a different
vector) and never return it back to the cache.
So, the following patch passes NULL for the larger vectors, which
append_ctor_to_tree_vector handles just fine now (vec_safe_reserve will
just allocate appropriately sized vector).
2025-01-23 Jakub Jelinek <jakub@redhat.com>
* c-common.cc (make_tree_vector_from_ctor): Only use make_tree_vector
for ctors with <= 16 elements.
Jakub Jelinek [Thu, 23 Jan 2025 19:03:36 +0000 (20:03 +0100)]
vect: Avoid copying of uninitialized variable [PR118628]
vectorizable_{store,load} does roughly
tree offvar;
tree running_off;
if (!costing_p)
{
... initialize offvar ...
}
running_off = offvar;
for (...)
{
if (costing_p)
{
...
continue;
}
... use running_off ...
}
so, it copies unconditionally sometimes uninitialized variable (but then
uses the copied variable only if it was set to something initialized).
Still, I think it is better to avoid copying around maybe uninitialized
vars.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118628
* tree-vect-stmts.cc (vectorizable_store, vectorizable_load):
Initialize offvar to NULL_TREE.
Harald Anlauf [Wed, 22 Jan 2025 21:44:39 +0000 (22:44 +0100)]
Fortran: do not evaluate arguments of MAXVAL/MINVAL too often [PR118613]
PR fortran/118613
gcc/fortran/ChangeLog:
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Adjust algorithm
for inlined version of MINLOC and MAXLOC so that arguments are only
evaluted once, and create temporaries where necessary. Document
change of algorithm.
gcc/testsuite/ChangeLog:
* gfortran.dg/maxval_arg_eval_count.f90: New test.
Georg-Johann Lay [Sat, 11 Jan 2025 13:10:29 +0000 (14:10 +0100)]
AVR: PR118012 - Try to work around sick code from match.pd.
This patch tries to work around PR118012 which may use a
full fledged multiplication instead of a simple bit test.
This is because match.pd's
/* (zero_one == 0) ? y : z <op> y -> ((typeof(y))zero_one * z) <op> y */
/* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */
"optimizes" code with op in { plus, ior, xor } like
if (a & 1)
b = b <op> c;
to something like:
x1 = EXTRACT_BIT0 (a);
x2 = c MULT x1;
b = b <op> x2;
or
x1 = EXTRACT_BIT0 (a);
x2 = ZERO_EXTEND (x1);
x3 = NEG x2;
x4 = a AND x3:
b = b <op> x4;
which is very expensive and may even result in a libgcc call for
a 32-bit multiplication on devices that don't even have MUL.
Notice that EXTRACT_BIT0 is already more expensive (slower, more
code, more register pressure) than a bit-test + branch.
The patch:
o Adds some combiner patterns that try to map sick code back
to a bit test + branch.
o Adjusts costs to make MULT (x AND 1) cheap, in the hope that the
middle-end will use that alternative (which we map to sane code).
o On devices without MUL, 32-bit multiplication was performed by a
library call, which bypasses the MULT (x AND 1) and similar patterns.
Therefore, mulsi3 is also allowed for devices without MUL so that
we get at MULT pattern that can be transformed. (Though this is
not possible on AVR_TINY since it passes arguments on the stack).
o Add a new command line option -mpr118012, so most of the patterns
and cost computations can be switched off as they have
avropt_pr118012 in their insn condition.
o Added sign-extract.0 patterns unconditionally (no avropt_pr118012).
Notice that this patch is just a work-around, it's not a fix of the
root cause, which are the patterns in match.pd that don't care about
the target and don't even care about costs.
The work-around is incomplete, and 3 of the new tests are still failing.
This is because there are situations where it does not work:
* The MULT is realized as a library call.
* The MULT is realized as an ASHIFT, and the ASHIFT again is transformed
into something else. For example, with -O2 -mmcu=atmega128,
ASHIFT(3) is transformed into ASHIFT(1) + ASHIFT(2).
PR tree-optimization/118012
PR tree-optimization/118360
gcc/
* config/avr/avr.opt (-mpr118012): New undocumented option.
* config/avr/avr-protos.h (avr_out_sextr)
(avr_emit_skip_pixop, avr_emit_skip_clear): New protos.
* config/avr/avr.cc (avr_adjust_insn_length)
[case ADJUST_LEN_SEXTR]: Handle case.
(avr_rtx_costs_1) [NEG]: Costs for NEG (ZERO_EXTEND (ZERO_EXTRACT)).
[MULT && avropt_pr118012]: Costs for MULT (x AND 1).
(avr_out_sextr, avr_emit_skip_pixop, avr_emit_skip_clear): New
functions.
* config/avr/avr.md [avropt_pr118012]: Add combine patterns with
that condition that try to work around PR118012.
(adjust_len) <sextr>: Add insn attr value.
(pixop): New code iterator.
(mulsi3) [avropt_pr118012 && !AVR_TINY]: Allow these in insn condition.
gcc/testsuite/
* gcc.target/avr/mmcu/pr118012-1.h: New file.
* gcc.target/avr/mmcu/pr118012-1-o2-m128.c: New test.
* gcc.target/avr/mmcu/pr118012-1-os-m128.c: New test.
* gcc.target/avr/mmcu/pr118012-1-o2-m103.c: New test.
* gcc.target/avr/mmcu/pr118012-1-os-m103.c: New test.
* gcc.target/avr/mmcu/pr118012-1-o2-t40.c: New test.
* gcc.target/avr/mmcu/pr118012-1-os-t40.c: New test.
* gcc.target/avr/mmcu/pr118360-1.h: New file.
* gcc.target/avr/mmcu/pr118360-1-o2-m128.c: New test.
* gcc.target/avr/mmcu/pr118360-1-os-m128.c: New test.
* gcc.target/avr/mmcu/pr118360-1-o2-m103.c: New test.
* gcc.target/avr/mmcu/pr118360-1-os-m103.c: New test.
* gcc.target/avr/mmcu/pr118360-1-o2-t40.c: New test.
* gcc.target/avr/mmcu/pr118360-1-os-t40.c: New test.
which is quite expensive for simple bit access in a bitmap. The reason is that
the bit access is implemented using iterators
return begin()[__n];
Which in turn cares about situation where __n is negative yielding the extra
conditional.
While we can use __builtin_unreachable to declare that __n is in range
0...max_size () but I think it is better to implement it directly, since
resulting code is shorter and much easier to optimize.
We now porduce:
.LFB1248:
.cfi_startproc
movq (%rdi), %rax
movq %rsi, %rdx
shrq $6, %rdx
andq (%rax,%rdx,8), %rsi
andl $63, %esi
setne %al
ret
Testcase suggests
movq (%rdi), %rax
movl %esi, %ecx
shrq $5, %rsi # does still need to be 64-bit
movl (%rax,%rsi,4), %eax
btl %ecx, %eax
setb %al
retq
Which is still one instruction shorter.
libstdc++-v3/ChangeLog:
PR target/80813
* include/bits/stl_bvector.h (vector<bool, _Alloc>::operator []): Do
not use iterators.
gcc/testsuite/ChangeLog:
PR target/80813
* g++.dg/tree-ssa/bvector-3.C: New test.
rtl-ssa uses degenerate phis to maintain an RPO list of
accesses in which every use is of the RPO-previous definition.
Thus, if it finds that a phi is always equal to a particular
value V, it sometimes needs to keep the phi and make V the
single input, rather than replace all uses of the phi with V.
The code to do that rerouted the phi's first input to the single
value V. But as this PR shows, it failed to unlink the uses of
the other inputs.
The specific problem in the PR was that we had:
x = PHI<x(a), V(b)>
The code replaced the first input with V and removed the second
input from the phi, but it didn't unlink the use of V associated
with that second input.
gcc/
PR rtl-optimization/118562
* rtl-ssa/blocks.cc (function_info::replace_phi): When converting
to a degenerate phi, make sure to remove all uses of the previous
inputs.
gcc/testsuite/
PR rtl-optimization/118562
* gcc.dg/torture/pr118562.c: New test.
GCC 15 is the first release to support FP8 intrinsics.
The underlying instructions depend on the value of a new register,
FPMR. Unlike FPCR, FPMR is a normal call-clobbered/caller-save
register rather than a global register. So:
- The FP8 intrinsics take a final uint64_t argument that
specifies what value FPMR should have.
- If an FP8 operation is split across multiple functions,
it is likely that those functions would have a similar argument.
If the object code has the structure:
for (...)
fp8_kernel (..., fpmr_value);
then fp8_kernel would set FPMR to fpmr_value each time it is
called, even though FPMR will already have that value for at
least the second and subsequent calls (and possibly the first).
The working assumption for the ABI has been that writes to
registers like FPMR can in general be more expensive than
reads and so it would be better to use a conditional write like:
instead of writing the same value to FPMR repeatedly.
This patch implements that. It also adds a tuning flag that suppresses
the behaviour, both to make testing easier and to support any future
cores that (for example) are able to rename FPMR.
Hopefully this really is the last part of the FP8 enablement.
gcc/
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_CHEAP_FPMR_WRITE): New tuning flag.
* config/aarch64/aarch64.h (TARGET_CHEAP_FPMR_WRITE): New macro.
* config/aarch64/aarch64.md: Split moves into FPMR into a test
and branch around.
(aarch64_write_fpmr): New pattern.
IRA would prefer to spill R to memory rather than allocate a GPR.
This is because the register move cost for GENERAL_REGS to
MOVEABLE_SYSREGS is very high:
/* Moves to/from sysregs are expensive, and must go via GPR. */
if (from == MOVEABLE_SYSREGS)
return 80 + aarch64_register_move_cost (mode, GENERAL_REGS, to);
if (to == MOVEABLE_SYSREGS)
return 80 + aarch64_register_move_cost (mode, from, GENERAL_REGS);
but the memory cost for MOVEABLE_SYSREGS was the same as for
GENERAL_REGS, making memory much cheaper.
Loading and storing FPMR involves a GPR temporary, so the cost should
account for moving into and out of that temporary.
This did show up indirectly in some of the existing asm tests,
where the stack frame allocated 16 bytes for callee saves (D8)
and another 16 bytes for spilling a temporary register.
It's possible that other registers need the same treatment
and it's more than probable that this code needs a rework.
None of that seems suitable for stage 4 though.
gcc/
* config/aarch64/aarch64.cc (aarch64_memory_move_cost): Account
for the cost of moving in and out of GENERAL_SYSREGS.
gcc/testsuite/
* gcc.target/aarch64/acle/fpmr-5.c: New test.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Don't expect
a spill slot to be allocated.
* gcc.target/aarch64/sve2/acle/asm/mlalb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlallbt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltb_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalltt_lane_mf8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/mlalt_lane_mf8.c: Likewise.
GCC 15 is going to be the first release to support FPMR.
The alternatives for moving values into FPMR were missing
a zero alternative, meaning that moves of zero would use an
unnecessary temporary register.
gcc/
* config/aarch64/aarch64.md (*mov<SHORT:mode>_aarch64)
(*movsi_aarch64, *movdi_aarch64): Allow the source of an MSR
to be zero.
gcc/testsuite/
* gcc.target/aarch64/acle/fp8.c: Add tests for moving zero into FPMR.
Jakub Jelinek [Thu, 23 Jan 2025 10:46:18 +0000 (11:46 +0100)]
tree-assume: Fix UB in assume_query [PR118605]
The assume_query constructor does
assume_query::assume_query (function *f, bitmap p) : m_parm_list (p),
m_func (f)
where m_parm_list is bitmap &. This is compile time UB, because
as soon as the constructor returns, m_parm_list reference is still
bound to the parameter of the constructor which is no longer in scope.
Now, one possible fix would be change the ctor argument to be bitmap &,
but that doesn't really work because in the only user of that class
we have
auto_bitmap decls;
...
assume_query query (fun, decls);
and auto_bitmap just has
operator bitmap () { return &m_bits; }
Could be perhaps const bitmap &, but why? bitmap is a pointer:
typedef class bitmap_head *bitmap;
and the EXECUTE_IF_SET_IN_BITMAP macros don't really change that point,
they just inspect what is inside of that bitmap_head the pointer points
to.
So, the simplest I think is avoid references (which cause even worse
code as it has to be dereferenced twice rather than once).
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118605
* tree-assume.cc (assume_query::m_parm_list): Change type
from bitmap & to bitmap.
Tejas Belagod [Mon, 30 Oct 2023 08:17:34 +0000 (13:47 +0530)]
OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.
Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc. This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.
gcc/ChangeLog:
* omp-low.cc (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.
Jakub Jelinek [Thu, 23 Jan 2025 10:17:47 +0000 (11:17 +0100)]
c++: Fix build_omp_array_section for type dependent array_expr [PR118590]
As can be seen on the testcase, when array_expr is type dependent, assuming
it has non-NULL TREE_TYPE is just wrong, it can often have NULL type, and even
if not, blindly assuming it is a pointer or array type is also wrong.
So, like in many other spots in the C++ FE, for type dependent expressions
we want to create something which will survive until instantiation and can be
redone at that point.
Unfortunately, build_omp_array_section is called before we actually do any
kind of checking what array_expr really is, and on invalid code it can be e.g.
a TYPE_DECL on which type_dependent_expression_p ICEs (as can be seen on the
pr67522.C testcase). So, I've hacked this by checking it is not TYPE_DECL,
I hope a TYPE_P can't make it through there when we just lookup an identifier.
Anyway, this patch is not enough, we can ICE e.g. on __uint128_t[0:something]
during instantiation, so I think something needs to be done for this in pt.cc
as well.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR c++/118590
* typeck.cc (build_omp_array_section): If array_expr is type dependent
or a TYPE_DECL, build OMP_ARRAY_SECTION with NULL type.
Jakub Jelinek [Thu, 23 Jan 2025 10:13:52 +0000 (11:13 +0100)]
c++: Fix weird expression in test for clauses other than when/default/otherwise [PR118604]
Some clang analyzer warned about
if (!strcmp (p, "when") == 0 && !default_p)
which really looks weird, it is better to use strcmp (p, "when") != 0
or !!strcmp (p, "when"). Furthermore, as a micro optimization, it is cheaper
to evaluate default_p than calling strcmp, so that can be put first in the &&.
The C test for the same thing wasn't that weird, but I think for consistency
it is better to use the same test rather than trying to be creative.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR c++/118604
gcc/c/
* c-parser.cc (c_parser_omp_metadirective): Rewrite
condition for clauses other than when, default and otherwise.
gcc/cp/
* parser.cc (cp_parser_omp_metadirective): Test !default_p
first and use strcmp () != 0 rather than !strcmp () == 0.
Jakub Jelinek [Thu, 23 Jan 2025 10:11:23 +0000 (11:11 +0100)]
builtins: Store unspecified value to *exp for inf/nan [PR114877]
The fold_builtin_frexp folding for NaN/Inf just returned the first argument
with evaluating second arguments side-effects, rather than storing something
to what the second argument points to.
The PR argues that the C standard requires the function to store something
there but what exactly is stored is unspecified, so not storing there
anything can result in UB if the value isn't initialized and is read later.
glibc and newlib store there 0, musl apparently doesn't store anything.
The following patch stores there zero (or would you prefer storing there
some other value, 42, INT_MAX, INT_MIN, etc.?; zero is cheapest to form
in assembly though) and adjusts the test so that it
doesn't rely on not storing there anything but instead checks for
-Wmaybe-uninitialized warning to find out that something has been stored
there.
Unfortunately I had to disable the NaN tests for -O0, while we can fold
__builtin_isnan (__builtin_nan ("")) at compile time, we can't fold
__builtin_isnan ((i = 0, __builtin_nan (""))) at compile time.
fold_builtin_classify uses just tree_expr_nan_p and if that isn't true
(because expr is a COMPOUND_EXPR with tree_expr_nan_p on the second arg),
it does
arg = builtin_save_expr (arg);
return fold_build2_loc (loc, UNORDERED_EXPR, type, arg, arg);
and that isn't folded at -O0 further, as we wrap it into SAVE_EXPR and
nothing propagates the NAN to the comparison.
I think perhaps tree_expr_nan_p etc. could have case COMPOUND_EXPR:
added and recurse on the second argument, but that feels like stage1
material to me if we want to do that at all.
2025-01-23 Jakub Jelinek <jakub@redhat.com>
PR middle-end/114877
* builtins.cc (fold_builtin_frexp): Handle rvc_nan and rvc_inf cases
like rvc_zero, return passed in arg and set *exp = 0.
* gcc.dg/torture/builtin-frexp-1.c: Add -Wmaybe-uninitialized as
dg-additional-options.
(bar): New function.
(TESTIT_FREXP2): Rework the macro so that it doesn't test whether
nothing has been stored to what the second argument points to, but
instead that something has been stored there, whatever it is.
(main): Temporarily don't enable the nan tests for -O0.
Most baremetal toolchains will not have an implementation for alarm and
sigaction as they are target specific.
For arm-none-eabi with newlib, function signatures are exposed, but
there is no implmentation and thus the test cases causes a undefined
symbol link error.
gcc/testsuite/ChangeLog:
* gcc.dg/pr78185.c: Remove dg-do and replace with
with dg-require-effective-target of signal and alarm.
* gcc.dg/pr116906-1.c: Likewise.
* gcc.dg/pr116906-2.c: Likewise.
* gcc.dg/vect/pr101145inf.c: Use effective-target alarm.
* gcc.dg/vect/pr101145inf_1.c: Likewise.
* lib/target-supports.exp(check_effective_target_alarm): New.
Georg-Johann Lay [Wed, 22 Jan 2025 20:11:22 +0000 (21:11 +0100)]
AVR: PR117726 - Tweak 32-bit logical shifts of 25...30 for -Oz.
As it turns out, logical 32-bit shifts with an offset of 25..30 can
be performed in 7 instructions or less. This beats the 7 instruc-
tions required for the default code of a shift loop.
Plus, with zero overhead, these cases can be 3-operand.
This is only relevant for -Oz because with -Os, 3op shifts are
split with -msplit-bit-shift (which is not performed with -Oz).
Paul Thomas [Thu, 23 Jan 2025 08:27:04 +0000 (08:27 +0000)]
Fortran: Regression- fix ICE at fortran/trans-decl.c:1575 [PR96087]
2025-01-23 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/96087
* trans-decl.cc (gfc_get_symbol_decl): If a dummy is missing a
backend decl, it is likely that it has come from a module proc
interface. Look for the formal symbol by name in the containing
proc and use its backend decl.
* trans-expr.cc (gfc_apply_interface_mapping_to_expr): For the
same reason, match the name, rather than the symbol address to
perform the mapping.
gcc/testsuite/
PR fortran/96087
* gfortran.dg/pr96087.f90: New test.
Richard Biener [Tue, 21 Jan 2025 13:58:43 +0000 (14:58 +0100)]
tree-optimization/118558 - fix alignment compute with VMAT_CONTIGUOUS_REVERSE
There are calls to dr_misalignment left that do not correct for the
offset (which is vector type dependent) when the stride is negative.
Notably vect_known_alignment_in_bytes doesn't allow to pass through
such offset which the following adds (computing the offset in
vect_known_alignment_in_bytes would be possible as well, but the
offset can be shared as seen). Eventually this function could go away.
This leads to peeling for gaps not considerd, nor shortening of the
access applied which is what fixes the testcase on x86_64.
PR tree-optimization/118558
* tree-vectorizer.h (vect_known_alignment_in_bytes): Pass
through offset to dr_misalignment.
* tree-vect-stmts.cc (get_group_load_store_type): Compute
offset applied for negative stride and use it when querying
alignment of accesses.
(vectorizable_load): Likewise.
Nathaniel Shead [Mon, 16 Dec 2024 05:06:05 +0000 (16:06 +1100)]
c++: Fix mangling of lambdas in static data member initializers [PR107741]
This fixes an issue where lambdas declared in the initializer of a
static data member within the class body do not get a mangling scope of
that variable; this results in mangled names that do not conform to the
ABI spec.
To do this, the patch splits up grokfield for this case specifically,
allowing a declaration to be build and used in start_lambda_scope before
parsing the initializer, so that record_lambda_scope works correctly.
As a drive-by, this also fixes the issue of a static member not being
visible within its own initializer.
PR c++/107741
gcc/c-family/ChangeLog:
* c-opts.cc (c_common_post_options): Bump ABI version.
* cp-tree.h (start_initialized_static_member): Declare.
(finish_initialized_static_member): Declare.
* decl2.cc (start_initialized_static_member): New function.
(finish_initialized_static_member): New function.
* lambda.cc (record_lambda_scope): Support falling back to old
ABI (maybe with warning).
* parser.cc (cp_parser_member_declaration): Build decl early
when parsing an initialized static data member.
gcc/testsuite/ChangeLog:
* g++.dg/abi/macro0.C: Bump ABI version.
* g++.dg/abi/mangle74.C: Remove XFAILs.
* g++.dg/other/fold1.C: Restore originally raised error.
* g++.dg/abi/lambda-ctx2-19.C: New test.
* g++.dg/abi/lambda-ctx2-19vs20.C: New test.
* g++.dg/abi/lambda-ctx2-20.C: New test.
* g++.dg/abi/lambda-ctx2.h: New test.
* g++.dg/cpp0x/static-member-init-1.C: New test.
Nathaniel Shead [Wed, 22 Jan 2025 10:24:03 +0000 (21:24 +1100)]
c++/modules: Fix exporting temploid friends in header units [PR118582]
When we started streaming the bit to handle merging of imported temploid
friends in r15-2807, I unthinkingly only streamed it in the
'!state->is_header ()' case.
This patch reworks the streaming logic to ensure that this data is
always streamed, including for unique entities (in case that ever comes
up somehow). This does make the streaming slightly less efficient, as
functions and types will need an extra byte, but this doesn't appear to
make a huge difference to the size of the resulting module; the 'std'
module on my machine grows by 0.2% from 30671136 to 30730144 bytes.
Haochen Jiang [Thu, 23 Jan 2025 01:52:01 +0000 (09:52 +0800)]
i386: Change mnemonics from V[GETMANT,REDUCENE,RNDSCALENE]PBF16 to V[GETMANT,REDUCE,RNDSCALE]BF16
gcc/ChangeLog:
PR target/118270
* config/i386/avx10_2-512bf16intrin.h: Change intrin and builtin
name according to new mnemonics.
* config/i386/avx10_2bf16intrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_VRNDSCALEBF16): Rename from UNSPEC_VRNDSCALENEPBF16.
(UNSPEC_VREDUCEBF16): Rename from UNSPEC_VREDUCENEPBF16.
(UNSPEC_VGETMANTBF16): Rename from UNSPEC_VGETMANTPBF16.
(BF16IMMOP): Adjust iterator due to UNSPEC name change.
(bf16immop): Ditto.
(avx10_2_<bf16immop>pbf16_<mode><mask_name>): Rename to...
(avx10_2_<bf16immop>bf16_<mode><mask_name>): ...this. Change
instruction name output.
Haochen Jiang [Thu, 23 Jan 2025 01:52:00 +0000 (09:52 +0800)]
i386: Change mnemonics from VMINMAXNEPBF16 to VMINMAXBF16
gcc/ChangeLog:
PR target/118270
* config/i386/avx10_2-512minmaxintrin.h: Change intrin and
builtin name according to new mnemonics.
* config/i386/avx10_2minmaxintrin.h: Ditto.
* config/i386/i386-builtin.def (BDESC): Ditto.
* config/i386/sse.md
(UNSPEC_MINMAXBF16): Rename from UNSPEC_MINMAXNEPBF16.
(avx10_2_minmaxnepbf16_<mode><mask_name>): Rename to...
(avx10_2_minmaxbf16_<mode><mask_name>): ...this. Change
instruction name output.
Haochen Jiang [Thu, 23 Jan 2025 01:51:54 +0000 (09:51 +0800)]
i386: Enhance AMX tests
After Binutils got changed, the previous usage on intrin will raise
warning for assembler. We need to change that. Besides that, there
are separate issues for both AMX-MOVRS and AMX-TRANSPOSE.
For AMX-MOVRS, t2rpntlvwrs tests wrongly used AMX-TRANSPOSE intrins
in test. Since the only difference between them is the "rs" hint,
it won't change result.
For AMX-TRANSPOSE, "t1" hint test is missing.
This patch fixed both of them. Also changing AMX-MOVRS test file
name to make it match with other AMX tests.
d,ada/spec: only sub nostd{inc,lib} rather than nostd{inc,lib}*
This prevents the gcc driver erroneously accepting -nostdlib++ when it
should not when Ada was enabled.
Also, similarly, -nostdinc* (where * is nonempty) is unhandled by either
the Ada or D compiler, so the spec should not substitute those
either (thanks for pointing that out, Jakub).
Brought to my attention by Michał Górny <mgorny@gentoo.org>.
gcc/ada/ChangeLog:
* gcc-interface/lang-specs.h: Replace %{nostdinc*} %{nostdlib*}
with %{nostdinc} %{nostdlib}.
gcc/d/ChangeLog:
* lang-specs.h: Replace %{nostdinc*} with %{nostdinc}.
Jakub Jelinek [Wed, 22 Jan 2025 18:36:36 +0000 (19:36 +0100)]
c++: Implement for static locals CWG 2867 - Order of initialization for structured bindings [PR115769]
On Wed, Aug 14, 2024 at 10:06:24AM +0200, Jakub Jelinek wrote:
> Though, now that I think about it again, perhaps what we could do instead
> is just make sure the _ZGVZ3barvEDC1x1y1z1wE initialization doesn't have
> a CLEANUP_POINT_EXPR in it and wrap both the _ZGVZ3barvEDC1x1y1z1wE
> and cp_finish_decomp created stuff into a single CLEANUP_POINT_EXPR.
> That way, perhaps _ZGVZ3barvEDC1x1y1z1wE could be initialized by one thread
> and _ZGVZ3barvE1x by a different, but the temporaries from _ZGVZ3barvEDC1x1y1z1wE
> initialization would be only destructed after the _ZGVZ3barvE1w guard
> was released by the thread which initialized _ZGVZ3barvEDC1x1y1z1wE.
Here is the I believe ABI compatible version, which uses the separate
guard variables, so different structured binding variables can be
initialized in different threads, but the thread that did the artificial
base initialization will keep temporaries live at least until the last
guard variable is released (i.e. when even that variable has been
initialized).
2025-01-22 Jakub Jelinek <jakub@redhat.com>
PR c++/115769
* decl.cc: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decl): If need_decomp_init, for function scope structure
binding bases, temporarily clear stmts_are_full_exprs_p before
calling expand_static_init, after it call cp_finish_decomp and wrap
code emitted by both into maybe_cleanup_point_expr_void and ensure
cp_finish_decomp isn't called again.
* g++.dg/DRs/dr2867-3.C: New test.
* g++.dg/DRs/dr2867-4.C: New test.
David Malcolm [Wed, 22 Jan 2025 13:35:41 +0000 (08:35 -0500)]
jit: fix startup on aarch64
libgccjit fails on startup on aarch64 (and probably other archs).
The issues are that
(a) within jit_langhook_init the call to
targetm.init_builtins can use types that aren't representable
via jit::recording::type, and
(b) targetm.init_builtins can call lang_hooks.decls.pushdecl, which
although a no-op for libgccjit has a gcc_unreachable.
Fixed thusly.
gcc/jit/ChangeLog:
* dummy-frontend.cc (tree_type_to_jit_type): For POINTER_TYPE,
bail out if the inner call to tree_type_to_jit_type fails.
Don't abort on unknown types.
(jit_langhook_pushdecl): Replace gcc_unreachable with return of
NULL_TREE.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
While working on another MSR-related patch, I noticed that
aarch64_write_sysregdi's constraints allowed zero, but its
predicate didn't. This could in principle lead to an ICE
during or after RA, since "Z" allows the RA to rematerialise
a known zero directly into the instruction.
The usual techniques for exposing a bug like that didn't work in this
case, since the optimisers seem to make no attempt to remove redundant
zero moves (at least not for these unspec_volatiles). But the problem
still seems worth fixing pre-emptively.
gcc/
* config/aarch64/aarch64.md (aarch64_read_sysregti): Change
the source predicate to aarch64_reg_or_zero.
gcc/testsuite/
* gcc.target/aarch64/acle/rwsr-4.c: New test.
* gcc.target/aarch64/acle/rwsr-armv8p9.c: Avoid read of uninitialized
variable.
Simon Martin [Wed, 22 Jan 2025 09:44:32 +0000 (10:44 +0100)]
c++: Clear TARGET_EXPR_ELIDING_P when forced to use a copy constructor due to __no_unique_address__ [PR118199]
We currently fail with a checking assert upon the following valid code
when using -fno-elide-constructors
=== cut here ===
struct d { ~d(); };
d &b();
struct f {
[[__no_unique_address__]] d e;
};
struct h : f {
h() : f{b()} {}
} i;
=== cut here ===
The problem is that split_nonconstant_init_1 detects that it cannot
elide the copy constructor due to __no_unique_address__ but does not
clear TARGET_EXPR_ELIDING_P, and due to -fno-elide-constructors, we trip
on a checking assert in cp_gimplify_expr.
This patch fixes this by making sure that we clear TARGET_EXPR_ELIDING_P
if we determine that we have to keep the copy constructor due to
__no_unique_address__. An alternative would be to just check for
elide_constructors in that assert, but I think it'd lose most of its
value if we did so.
PR c++/118199
gcc/cp/ChangeLog:
* typeck2.cc (split_nonconstant_init_1): Clear
TARGET_EXPR_ELIDING_P if we need to use a copy constructor
because of __no_unique_address__.
Xi Ruoyao [Tue, 21 Jan 2025 15:01:38 +0000 (23:01 +0800)]
LoongArch: Fix wrong code with <optab>_alsl_reversesi_extended
The second source register of this insn cannot be the same as the
destination register.
gcc/ChangeLog:
* config/loongarch/loongarch.md
(<optab>_alsl_reversesi_extended): Add '&' to the destination
register constraint and append '0' to the first source register
constraint to indicate the destination register cannot be same
as the second source register, and change the split condition to
reload_completed so that the insn will be split only after RA in
order to obtain allocated registers that satisfy the above
constraints.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/bitwise-shift-reassoc-clobber.c: New
test.
Jakub Jelinek [Wed, 22 Jan 2025 08:24:34 +0000 (09:24 +0100)]
c++: Improve cp_parser_objc_messsage_args compile time
On Tue, Jan 21, 2025 at 06:47:53PM +0100, Jakub Jelinek wrote:
> Indeed, I've just used what it was doing without thinking too much about it,
> sorry.
> addl_args = tree_cons (NULL_TREE, arg, addl_args);
> with addl_args = nreverse (addl_args); after the loop might be better,
> can test that incrementally. sel_args is handled the same and should have
> the same treatment.
Here is incremental patch to do that.
Verified also on the 2 va-meth*.mm testcases (one without CPP_EMBED, one
with) that -fdump-tree-gimple is the same before/after the patch.
2025-01-22 Jakub Jelinek <jakub@redhat.com>
* parser.cc (cp_parser_objc_message_args): Use tree_cons with
nreverse at the end for both sel_args and addl_args, instead of
chainon with build_tree_list second argument.
Jakub Jelinek [Wed, 22 Jan 2025 08:22:56 +0000 (09:22 +0100)]
c++: Introduce append_ctor_to_tree_vector
On Mon, Jan 20, 2025 at 05:14:33PM -0500, Jason Merrill wrote:
> > --- gcc/cp/call.cc.jj 2025-01-15 18:24:36.135503866 +0100
> > +++ gcc/cp/call.cc 2025-01-17 14:42:38.201643385 +0100
> > @@ -4258,11 +4258,30 @@ add_list_candidates (tree fns, tree firs
> > /* Expand the CONSTRUCTOR into a new argument vec. */
>
> Maybe we could factor out a function called something like
> append_ctor_to_tree_vector from the common code between this and
> make_tree_vector_from_ctor?
>
> But this is OK as is if you don't want to pursue that.
I had the previous patch already tested and wanted to avoid delaying
the large initializer speedup re-reversion any further, so I've committed
the patch as is.
Here is an incremental patch to factor that out.
2025-01-22 Jakub Jelinek <jakub@redhat.com>
gcc/c-family/
* c-common.h (append_ctor_to_tree_vector): Declare.
* c-common.cc (append_ctor_to_tree_vector): New function.
(make_tree_vector_from_ctor): Use it.
gcc/cp/
* call.cc (add_list_candidates): Use append_ctor_to_tree_vector.
Patrick Palka [Wed, 22 Jan 2025 02:57:02 +0000 (21:57 -0500)]
c++: 'this' capture clobbered during recursive inst [PR116756]
Here during instantiation of generic lambda's op() [with I = 0] we
substitute into the call self(self, cst<1>{}) which requires recursive
instantiation of the same op() [with I = 1] (which isn't deferred due to
lambda's deduced return type. During this recursive instantiation, the
DECL_EXPR case of tsubst_stmt clobbers LAMBDA_EXPR_THIS_CAPTURE to point
to the child op()'s specialized capture proxy instead of the parent's,
and the original value is never restored.
So later when substituting into the openSeries call in the parent op()
maybe_resolve_dummy uses the 'this' proxy belonging to the child op(),
which leads to a context mismatch ICE during gimplification of the
proxy.
An earlier version of this patch fixed this by making instantiate_body
save/restore LAMBDA_EXPR_THIS_CAPTURE during a lambda op() instantiation.
But it seems cleaner to avoid overwriting LAMBDA_EXPR_THIS_CAPTURE in the
first place by making it point to the non-specialized capture proxy, and
instead call retrieve_local_specialization as needed, which is what this
patch implements. It's natural then to not clear LAMBDA_EXPR_THIS_CAPTURE
after parsing/regenerating a lambda.
PR c++/116756
gcc/cp/ChangeLog:
* lambda.cc (lambda_expr_this_capture): Call
retrieve_local_specialization on the result of
LAMBDA_EXPR_THIS_CAPTURE for a generic lambda.
* parser.cc (cp_parser_lambda_expression): Don't clear
LAMBDA_EXPR_THIS_CAPTURE.
* pt.cc (tsubst_stmt) <case DECL_EXPR>: Don't overwrite
LAMBDA_EXPR_THIS_CAPTURE with the specialized capture.
(tsubst_lambda_expr): Don't clear LAMBDA_EXPR_THIS_CAPTURE
afterward.