git.ipfire.org Git - thirdparty/gcc.git/log

]> git.ipfire.org Git - thirdparty/gcc.git/log

projects / thirdparty / gcc.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Richard Sandiford [Mon, 11 Nov 2024 12:32:13 +0000 (12:32 +0000)]

Add push/pop_function_decl

For the aarch64 simd clones patches, it would be useful to be able to
push a function declaration onto the cfun stack, even though it has no
function body associated with it.  That is, we want cfun to be null,
current_function_decl to be the decl itself, and the target and
optimisation flags to reflect the declaration.

This patch adds a push/pop_function_decl pair to do that.

I think the more direct way of doing what I want to do under the
existing interface would have been:

  push_cfun (nullptr);
  invoke_set_current_function_hook (fndecl);
  pop_cfun ();

where invoke_set_current_function_hook would need to become public.
But it seemed safer to use the higher-level routines, since it makes
sure that the target/optimisation changes are synchronised with the
function changes.  In particular, if cfun was null before the
sequence above, the pop_cfun would leave the flags unchanged,
rather than restore them to the state before the push_cfun.

gcc/
* function.h (push_function_decl, pop_function_decl): Declare.
* function.cc (set_function_decl): New function, extracted from...
(set_cfun): ...here.
(push_function_decl): New function, extracted from...
(push_cfun): ...here.
(pop_cfun_1): New function, extracted from...
(pop_cfun): ...here.
(pop_function_decl): New function.

commit | commitdiff | tree

Paul Thomas [Mon, 11 Nov 2024 12:21:57 +0000 (12:21 +0000)]

Fortran: Fix elemental array refs in SELECT TYPE [PR109345]

2024-11-10 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/109345
* trans-array.cc (gfc_get_array_span): Unlimited polymorphic
expressions are now treated separately since the span need not
be the same as the element size.

gcc/testsuite/
PR fortran/109345
* gfortran.dg/character_workout_1.f90: Cut trailing whitespace.
* gfortran.dg/pr109345.f90: New test.

commit | commitdiff | tree

Richard Biener [Mon, 11 Nov 2024 08:40:20 +0000 (09:40 +0100)]

tree-optimization/117510 - fix guard hoisting validity check

For the loop in the testcase we currently fail to hoist the guard
check of the inner loop (m > 0) out of the outer loop because
find_loop_guard checks all blocks of the outer loop for side-effects,
including those that are skipped by the guard. This usually
is harmless as the guard does not skip any blocks in the outer loop
but in this case store-motion was applied to the inner loop and thus
there's now a skipped store in the outer loop.

The following properly skips blocks that are dominated by the
entry to the skipped region.

PR tree-optimization/117510
* tree-ssa-loop-unswitch.cc (find_loop_guard): Only check
not skipped blocks for side-effects.

* gcc.dg/vect/vect-outer-pr117510.c: New testcase.

commit | commitdiff | tree

Gaius Mulley [Mon, 11 Nov 2024 11:43:06 +0000 (11:43 +0000)]

modula2: Reimplement parameter declaration and checking.

This patch improves the parameter declaration by saving all parameter
kinds: proper procedure, definition module procedure and forward
procedures.  This allows error messages to reference any parameter
in the three kinds of procedures.  Variables and their declaration
are also stored.  The expression, assignment and parameter checking
has been improved to highlight any variable or parameter and
its declaration causing a conflict.

gcc/m2/ChangeLog:

* gm2-compiler/M2Base.def (MixTypes): Rename parameters.
(MixTypesDecl): New procedure function.
* gm2-compiler/M2Base.mod (BuildOrdFunctions): Add
DefProcedure parameter to PutFunction.
(BuildTruncFunctions): Ditto.
(BuildFloatFunctions): Ditto.
(BuildIntFunctions): Ditto.
(InitBaseFunctions): Ditto.
(MixTypesDecl): New procedure function.
(MixTypes): Reimplement.
* gm2-compiler/M2Check.mod (checkProcType): Replace
NoOfParam with NoOfParamAny.
Replace IsVarParam with IsVarParamAny.
(checkProcedureProcType): Ditto.
* gm2-compiler/M2Error.def: Remove unnecessary export qualified list.
* gm2-compiler/M2GCCDeclare.mod: Replace NoOfParam with NoOfParamAny.
Replace IsVarParam with IsVarParamAny.
(DeclareProcedureToGccWholeProgram): Rename son to
Variable.
(DeclareProcedureToGccSeparateProgram): Ditto.
(PrintKind): New procedure.
(PrintProcedureParameters): Ditto.
(PrintProcedureReturnType): Ditto.
(PrintProcedure): Reimplement.
(PrintProcTypeParameters): New procedure.
(PrintProcType): Ditto.
(DeclareProcType): Rename Son to Parameter.
* gm2-compiler/M2GenGCC.mod: Replace NoOfParam with NoOfParamAny.
Replace IsVarParam with IsVarParamAny.
(ErrorMessageDecl): New procedure.
(checkIncorrectMeta): Replace call to MetaErrorT2 with
ErrorMessageDecl.
(ComparisonMixTypes): Add varleft and varright parameters.
Adjust all callers of ComparisonMixTypes.
* gm2-compiler/M2MetaError.def (MetaErrorDecl): New procedure.
* gm2-compiler/M2MetaError.mod (MetaErrorDecl): New procedure.
* gm2-compiler/M2Options.def (SetXCode): Add -fd flag description
to comment.
* gm2-compiler/M2Options.mod (SetXCode): Add -fd flag description
to comment.
* gm2-compiler/M2Quads.mod (CheckBreak): New procedure.
Replace NoOfParam with NoOfParamAny.
Replace IsVarParam with IsVarParamAny.
(FailParameter): Reimplement using GetVarDeclFullTok.
Generate message for formal parameter, actual parameter and
declaration of actual parameter.
(WarnParameter): Ditto.
(CheckBuildFunction): Reimplement error message using MetaErrorT1.
* gm2-compiler/M2Range.mod: Replace NoOfParam with NoOfParamAny.
Replace IsVarParam with IsVarParamAny.
* gm2-compiler/M2Scaffold.mod (DeclareScaffoldFunctions): Call
PutProcedureDefined after every procedure declaration.
(DeclareArgEnvParams): Add ProperProcedure parameter to PutParam.
* gm2-compiler/M2Size.mod (MakeSize): Add DefProcedure parameter
to PutFunction.
* gm2-compiler/M2Swig.mod: Replace NoOfParam with NoOfParamAny.
Replace IsVarParam with IsVarParamAny.
* gm2-compiler/M2SymInit.mod: Ditto.
* gm2-compiler/M2System.mod (InitSystem): Add DefProcedure
parameter to PutFunction.
* gm2-compiler/P1SymBuild.mod (StartBuildProcedure): Reimplement.
(EndBuildProcedure): Ditto.
(EndBuildForward): Ditto.
* gm2-compiler/P2Build.bnf (BuildProcedureDefinedByForward):
Remove.
(BuildProcedureDefinedByProper): Ditto.
(ForwardDeclaration): Remove BuildProcedureDefinedByForward.
(BuildNoReturnAttribute): Remove parameter.
* gm2-compiler/P2SymBuild.def (BuildNoReturnAttribute): Remove
parameter.
(BuildProcedureDefinedByForward): Remove.
(BuildProcedureDefinedByProper): Ditto.
* gm2-compiler/P2SymBuild.mod (Import): Remove
AreParametersDefinedInDefinition,
AreParametersDefinedInImplementation,
AreProcedureParametersDefined,
ParametersDefinedInDefinition,
ParametersDefinedInImplementation,
GetProcedureDeclaredDefinition,
GetProcedureDeclaredForward,
GetProcedureDeclaredProper,
GetParametersDefinedByForward,
GetParametersDefinedByProper and
PutProcedureNoReturn.
Add PutProcedureParametersDefined,
GetProcedureParametersDefined,
GetProcedureKindDesc,
GetProcedureDeclaredTok,
GetProcedureKind,
GetReturnTypeTok,
SetReturnOptional,
IsReturnOptional,
PutProcedureNoReturn and
PutProcedureDefined.
(Debug): New procedure.
(P2StartBuildDefModule): Space formatting.
(BuildVariable): Reimplement to record full declaration.
(StartBuildProcedure): Reimplement using token to determine
the kind of procedure.
(BuildProcedureHeading): Ditto.
(BuildFPSection): Ditto.
(BuildVarArgs): Ditto.
(BuildOptArg): Ditto.
(BuildProcedureDefinedByForward): Remove.
(BuildProcedureDefinedByProper): Ditto.
(BuildFormalParameterSection): Reimplement so that the
quad stack is unchanged.
(CheckFormalParameterSection): Ditto.
(RemoveFPParameters): New procedure.
(ParameterError): Reimplement.
(StartBuildFormalParameters): Add annotation.
(ParameterMismatch): Reimplement.
(EndBuildFormalParameters): Reimplement to check against
all procedure kinds.
(GetSourceDesc): Remove.
(GetCurSrcDesc): Ditto.
(GetDeclared): Ditto.
(ReturnTypeMismatch): Reimplement.
(BuildFunction): Ditto.
(BuildOptFunction): Ditto.
(CheckOptFunction): New procedure.
(BuildNoReturnAttribute): Remove parameter and obtain
procedure symbol from quad stack.
(CheckProcedureReturn): New procedure.
* gm2-compiler/P3SymBuild.mod (BuildOptArgInitializer):
Preserve ProcSym tok on the quad stack.
Add Assert.
* gm2-compiler/PCSymBuild.mod (fixupProcedureType): Replace
NoOfParam with NoOfParamAny.
* gm2-compiler/SymbolTable.def (GetNthParam): Add ProcedureKind
parameter.
(PutFunction): Ditto.
(PutOptFunction): Ditto.
(IsReturnOptional): Ditto.
(PutParam): Ditto.
(PutVarParam): Ditto.
(PutParamName): Ditto.
(PutProcedureNoReturn): Ditto.
(IsProcedureNoReturn): Ditto.
(IsVarParam): Ditto.
(IsUnboundedParam): Ditto.
(NoOfParam): Ditto.
(ForeachLocalSymDo): Ditto.
(GetProcedureKind): Ditto.
(GetProcedureDeclaredTok): Ditto.
(PutProcedureDeclaredTok): Ditto.
(GetReturnTypeTok): Ditto.
(PutReturnTypeTok): Ditto.
(PutParametersDefinedByForward): New procedure.
(PutProcedureParametersDefined): Ditto.
(PutProcedureDefined): Ditto.
(GetParametersDefinedByProper): Ditto.
(GetProcedureDeclaredForward): Ditto.
(GetProcedureDeclaredProper): Ditto.
(PutProcedureDeclaredProper): Ditto.
(GetProcedureDeclaredDefinition): Ditto.
(PutProcedureDeclaredDefinition): Ditto.
(GetProcedureDefined): Ditto.
(PutUseOptArg): Ditto.
(UsesOptArg): Ditto.
(PutOptArgInit): Ditto.
(SetReturnOptional): Ditto.
(UsesOptArgAny): Ditto.
(GetProcedureKindDesc): Ditto.
(IsReturnOptionalAny): New procedure function.
(GetNthParamAny): Ditto.
(NoOfParamAny): Ditto.
(IsProcedureAnyNoReturn): Ditto.
(AreParametersDefinedInImplementation): Remove.
(ParametersDefinedInImplementation): Ditto.
(AreParametersDefinedInDefinition): Ditto.
(AreProcedureParametersDefined): Ditto.
(ParametersDefinedInDefinition): Ditto.
(ProcedureParametersDefined): Ditto.
(PutParametersDefinedByProper): Ditto.
(PutProcedureDeclaredForward): Ditto.
(GetParametersDefinedByForward): Ditto.
(GetProcedureParametersDefined): Ditto.
(PushOffset): Ditto.
(PopSize): Ditto.
(PushParamSize): Ditto.
(PushSumOfLocalVarSize): Ditto.
(PushSumOfParamSize): Ditto.
(PopOffset): Ditto.
(PopSumOfParamSize): Ditto.
* gm2-compiler/SymbolTable.mod (MakeProcedure): Reimplement.
(PutProcedureNoReturn): Add ProcedureKind parameter.
(GetNthParam): Ditto.
(PutFunction): Ditto.
(PutOptFunction): Ditto.
(IsReturnOptional): Ditto.
(MakeVariableForParam): Ditto.
(PutParam): Ditto.
(PutVarParam): Ditto.
(PutParamName): Ditto.
(AddParameter): Ditto.
(IsVarParam): Ditto.
(IsVarParamAny): Ditto.
(NoOfParam): Ditto.
(HasVarParameters): Ditto.
(IsUnboundedParam): Ditto.
(PutUseVarArgs): Ditto.
(UsesVarArgs): Ditto.
(PutUseOptArg): Ditto.
(UsesOptArg): Ditto.
(UsesOptArgAny): Ditto.
(PutOptArgInit): Ditto.
(IsProcedure): Ditto.
(IsPointer): Ditto.
(IsRecord): Ditto.
(IsArray): Ditto.
(IsEnumeration): Ditto.
(IsUnbounded): Ditto.
(IsSet): Ditto.
(IsSetPacked): Ditto.
(CheckUnbounded): Ditto.
(IsOAFamily): Ditto.
(IsModuleWithinProcedure): Ditto.
(GetDeclaredDef): Ditto.
(GetDeclaredMod): Ditto.
(GetDeclaredFor): Ditto.
(GetProcedureDeclaredForward): Ditto.
(GetProcedureKind): Ditto.
(PutProcedureDeclaredForward): Ditto.
(GetProcedureDeclaredTok): Ditto.
(GetProcedureDeclaredProper): Ditto.
(PutProcedureDeclaredTok): Ditto.
(PutProcedureDeclaredProper): Ditto.
(GetReturnTypeTok): Ditto.
(GetProcedureDeclaredDefinition): Ditto.
(PutReturnTypeTok): Ditto.
(PutProcedureDeclaredDefinition): Ditto.
(GetProcedureKindDesc): Ditto.
(IsProcedureVariable): Ditto.
(IsAModula2Type): Ditto.
(GetParam): Ditto.
(ProcedureParametersDefined): Ditto.
(AreParametersDefinedInImplementation): Remove.
(AreParametersDefinedInDefinition): Ditto.
(AreProcedureParametersDefined): Ditto.
(IsSizeSolved): Ditto.
(IsOffsetSolved): Ditto.
(IsValueSolved): Ditto.
(IsSumOfParamSizeSolved): Ditto.
(PushSize): Ditto.
(PushOffset): Ditto.
(PopSize): Ditto.
(PushValue): Ditto.
(PushParamSize): Ditto.
(PushSumOfLocalVarSize): Ditto.
(PushSumOfParamSize): Ditto.
(PushVarSize): Ditto.
(PopValue): Ditto.
(PopSize): Ditto.
(PopOffset): Ditto.
(PopSumOfParamSize): Ditto.
(PutParametersDefinedByForward): New procedure.
(PutProcedureParametersDefined): Ditto.
(PutProcedureDefined): Ditto.
(GetParametersDefinedByProper): Ditto.
(GetProcedureDeclaredForward): Ditto.
(GetProcedureDeclaredProper): Ditto.
(PutProcedureDeclaredProper): Ditto.
(GetProcedureDeclaredDefinition): Ditto.
(PutProcedureDeclaredDefinition): Ditto.
(GetProcedureDefined): Ditto.
(PutUseOptArg): Ditto.
(UsesOptArg): Ditto.
(PutOptArgInit): Ditto.
(SetReturnOptional): Ditto.
(UsesOptArgAny): Ditto.
(GetProcedureKindDesc): Ditto.
(PutParametersDefinedByProper): Ditto.
(GetParametersDefinedByProper): Ditto.
(IsReturnOptionalAny): New procedure function.
(IsProcedureAnyDefaultBoolean): Ditto.
(IsProcedureAnyBoolean): Ditto.
(IsProcedureAnyNoReturn): Ditto.
(GetNthParamAny): Ditto.
(NoOfParamAny): Ditto.
(IsProcedureAnyNoReturn): Ditto.
(GetProcedureKind): Ditto.
(IsVarParamAny): Ditto.
(IsUnboundedParamAny): Ditto.
(ForeachParamSymDo): New comment.
* gm2-libs-coroutines/SYSTEM.mod: Reformat.

gcc/testsuite/ChangeLog:

* gm2/iso/fail/badexpression3.mod: New test.
* gm2/iso/fail/badparam4.def: New test.
* gm2/iso/fail/badparam4.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

commit | commitdiff | tree

Tobias Burnus [Mon, 11 Nov 2024 11:17:42 +0000 (12:17 +0100)]

libgomp/plugin/plugin-gcn.c: Show device number in ISA error message

libgomp/ChangeLog:

* plugin/plugin-gcn.c (isa_matches_agent): Mention the device number
and ROCR_VISIBLE_DEVICES when reporting an ISA mismatch error.

commit | commitdiff | tree

Pan Li [Mon, 11 Nov 2024 07:39:40 +0000 (15:39 +0800)]

RISC-V: Fix one nit indent issue of ustrunc pattern [NFC]

Just notice the indent is not that right for ustrunc pattern from
the md files. Thus, make it correct. It is somehow very obvious
and will commit it after next 48H if no more comments.

gcc/ChangeLog:

* config/riscv/autovec.md: Fix indent format issue.

Signed-off-by: Pan Li <pan2.li@intel.com>

commit | commitdiff | tree

Paul Thomas [Mon, 11 Nov 2024 09:01:11 +0000 (09:01 +0000)]

Fortran: Suppress invalid finalization of artificial variable [PR116388]

2024-11-11 Tomas Trnka <trnka@scm.com>
Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/116388
* class.cc (finalize_component): Leading underscore in the name
of 'byte_stride' to suppress invalid finalization.

gcc/testsuite/
PR fortran/116388
* gfortran.dg/finalize_58.f90: New test.

commit | commitdiff | tree

Thomas Koenig [Sat, 9 Nov 2024 18:24:43 +0000 (19:24 +0100)]

Reject UNSIGNED for Complex, some documentation fixes.

gcc/fortran/ChangeLog:

* check.cc (gfc_check_complex): Reject UNSIGNED.
* gfortran.texi: Update example program. Note that
CMPLX, INT and REAL also take unsigned arguments.
* intrinsic.texi (CMPLX): Document UNSIGNED.
(INT): Likewise.
(REAL): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/unsigned_41.f90: New test.

commit | commitdiff | tree

Sam James [Thu, 31 Oct 2024 21:09:32 +0000 (21:09 +0000)]

doc: install: document UBSAN_OPTIONS

Explain that 'bootstrap-ubsan' won't abort on errors by default and how
to override that by setting UBSAN_OPTIONS.

gcc/ChangeLog:
PR other/116948

* doc/install.texi (Building a native compiler): Document UBSAN_OPTIONS.

commit | commitdiff | tree

Sam James [Thu, 31 Oct 2024 21:06:13 +0000 (21:06 +0000)]

doc: install: document bootstrap-ubsan

gcc/ChangeLog:
PR other/116948

* doc/install.texi (Building a native compiler): Mention bootstrap-ubsan.

commit | commitdiff | tree

Takayuki 'January June' Suwa [Sun, 10 Nov 2024 06:39:22 +0000 (15:39 +0900)]

xtensa: Fix the issue in "*extzvsi-1bit_addsubx"

The second source register of insn "*extzvsi-1bit_addsubx" cannot be the
same as the destination register, because that register will be overwritten
with an intermediate value after insn splitting.

     /* example #1 */
     int test1(int b, int a) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result #1 (incorrect)
     test1:
      extui a2, a3, 10, 1 ;; overwrites A2 before used
      addx4 a2, a2, a2
      ret.n

This patch fixes that.

     ;; result #1 (correct)
     test1:
      extui a3, a3, 10, 1 ;; uses A3 and then overwrites
      addx4 a2, a3, a2
      ret.n

However, it should be noted that the first source register can be the same
as the destination without any problems.

     /* example #2 */
     int test2(int a, int b) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result (correct)
     test2:
      extui a2, a2, 10, 1 ;; uses A2 and then overwrites
      addx4 a2, a2, a3
      ret.n

gcc/ChangeLog:

* config/xtensa/xtensa.md (*extzvsi-1bit_addsubx):
Add '&' to the destination register constraint to indicate that
it is 'earlyclobber', append '0' to the first source register
constraint to indicate that it can be the same as the destination
register, and change the split condition from 1 to reload_completed
so that the insn will be split only after RA in order to obtain
allocated registers that satisfy the above constraints.

commit | commitdiff | tree

Haochen Jiang [Mon, 11 Nov 2024 02:48:16 +0000 (10:48 +0800)]

Initial Diamond Rapids Support

gcc/ChangeLog:

* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Diamond Rapids.
* common/config/i386/i386-common.cc (processor_name):
Add Diamond Rapids.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_COREI7_DIAMONDRAPIDS.
* config.gcc: Add -march=diamondrapids.
* config/i386/driver-i386.cc (host_detect_local_cpu): Handle
diamondrapids.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_DIAMONDRAPIDS): New.
(m_CORE_AVX512): Add diamondrapids.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.

commit | commitdiff | tree

Haochen Jiang [Mon, 11 Nov 2024 02:48:14 +0000 (10:48 +0800)]

i386: Add new model number for Arrow Lake

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_intel_cpu): Add new model
number for Arrow Lake.

commit | commitdiff | tree

liuhongt [Thu, 7 Nov 2024 02:15:42 +0000 (18:15 -0800)]

Guard truncate from vector float to vector __bf16 with !flag_rounding_math && HONOR_NANS (BFmode).

hw instruction doesn't raise exceptions, turns sNAN into qNAN quietly,
and always round to nearest (even). Output denormals are always
flushed to zero and input denormals are always treated as zero. MXCSR
is not consulted nor updated.
W/o native instructions, flag_unsafe_math_optimizations is needed for
the permutation instructions.
Similar guard extend from vector __bf16 to vector float with
!HONOR_NANS (BFmode).

gcc/ChangeLog:

* config/i386/i386.md (truncsf2bf2): Add !flag_rounding_math
to the condition, require flag_unsafe_math_optimizations when
native instruction is not available.
* config/i386/mmx.md: (truncv2sfv2bf2): Ditto.
(extendv2bfv2sf2): Add !HONOR_NANS (BFmode) to the condition.
* config/i386/sse.md: (truncv4sfv4sf2): Add
!flag_rounding_math to the condition, require
flag_unsafe_math_optimizations when native instruction is not
available.
(truncv8sfv8bf2): Ditto.
(truncv16sfv16bf2): Ditto.
(extendv4bfv4sf2): Add !HONOR_NANS (BFmode) to the condition.
(extendv8bfv8sf2): Ditto.
(extendv16bfv16sf2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-truncsfbf.c: Add -ffast-math.
* gcc.target/i386/avx512bw-extendbf2sf.c: Ditto.
* gcc.target/i386/avx512bw-truncsfbf.c: Ditto.
* gcc.target/i386/sse2-extendbf2sf.c: Ditto.
* gcc.target/i386/ssse3-truncsfbf.c: Ditto.

commit | commitdiff | tree

GCC Administrator [Mon, 11 Nov 2024 00:16:44 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Richard Biener [Fri, 8 Nov 2024 12:25:13 +0000 (13:25 +0100)]

Do not cost the permute node that are part of SLP load-lanes

There are some SVE testsuite fails when forcing SLP because costing
prevents VLA vectors from being used as we add permute cost for
the VEC_PERM nodes that are part of a SLP load-lanes node. The
permutes only exist for representational reasons and pessimize SLP
vs non-SLP so the following makes sure to cost them as zero.

* tree-vect-slp.cc (vectorizable_slp_permutation_1): Return
zero for the permute nodes part of load-lanes.

commit | commitdiff | tree

Thomas Schwinge [Sat, 9 Nov 2024 12:37:53 +0000 (13:37 +0100)]

Adjust 'libgomp.c/max_vf-*.c'

For configurations where both GCN and nvptx offloading are enabled, we get:

    PASS: libgomp.c/max_vf-1.c (test for excess errors)
    PASS: libgomp.c/max_vf-1.c scan-tree-dump-times ompexp "GOMP_MAX_VF" 2
    PASS: libgomp.c/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, D\\.[0-9]*, 0\\);" 1
    PASS: libgomp.c/max_vf-1.c scan-amdgcn-amdhsa-offload-tree-dump-times optimized "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 64, 0\\);" 1
    FAIL: libgomp.c/max_vf-1.c scan-nvptx-none-offload-tree-dump-times optimized "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 64, 0\\);" 1
    FAIL: libgomp.c/max_vf-1.c scan-amdgcn-amdhsa-offload-tree-dump-times optimized "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 7, 0\\);" 1
    PASS: libgomp.c/max_vf-1.c scan-nvptx-none-offload-tree-dump-times optimized "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 7, 0\\);" 1

Avoid these FAILs via 'only_for_offload_target [...]'.  Also, for consistency
with other libgomp test cases, use effective-target specifiers of the libgomp
test suite.  Fix-up for recent commit d334f729e53867b838e867375b3f475ba793d96e
"openmp: Add testcases for omp_max_vf".

libgomp/
* testsuite/libgomp.c/max_vf-1.c: Adjust.
* testsuite/libgomp.c/max_vf-2.c: Likewise.

commit | commitdiff | tree

Lewis Hyatt [Sun, 3 Nov 2024 01:59:24 +0000 (21:59 -0400)]

c++: Fix tree_contains_struct for TRAIT_EXPR

CODE_CONTAINS_STRUCT () currently reports that a TRAIT_EXPR contains a
TS_EXP struct, but it does not actually start with a TS_EXP as an initial
sequence. In modules.cc, when we stream out a tree, we explicitly check for the
TS_EXP case and call note_location(t->exp.locus) if so. Currently, this
actually queries the tree_common::chain field of a tree_trait_expr, which
seems not to be used, returning 0, which is interpreted as UNKNOWN_LOCATION
and does no harm.

If location_t will change to be 64 bytes, as is under discussion, then on
32-bit platforms (well those, such as sparc, on which uint64_t has higher
alignment requirement than a pointer), reading t->exp.locus will end up
reading a different field (tree_trait_expr::type1) due to padding
offsets. That field is not generally 0, and the resulting bogus location_t
is sufficiently problematic to cause an ICE in the line_map code. Pretty
much any modules testcase displays the issue, such as partial-2_a.C.

Resolve by initializing tree_contains_struct with the correct value for
TRAIT_EXPR, namely TS_TYPED.

gcc/cp/ChangeLog:

* cp-objcp-common.cc (cp_common_init_ts): Change TRAIT_EXPR from
TS_EXP to TS_TYPED.

commit | commitdiff | tree

GCC Administrator [Sun, 10 Nov 2024 00:17:04 +0000 (00:17 +0000)]

Daily bump.

commit | commitdiff | tree

Iain Sandoe [Thu, 7 Nov 2024 17:17:46 +0000 (17:17 +0000)]

Darwin: Support '-ObjC{,++}' as shorthand for -xobjective-c{,++} [PR117478].

This improves compatibility with clang, and is used by some projects.

PR target/117478

gcc/ChangeLog:

* config/darwin-driver.cc (darwin_driver_init): Handle ObjC/ObjC++
* config/darwin.opt: Add ObjC/ObjC++ as driver-only options.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

commit | commitdiff | tree

Andrew Pinski [Fri, 8 Nov 2024 22:46:18 +0000 (14:46 -0800)]

fold: Remove (rrotate (rrotate A CST) CST) folding [PR117492]

This removes an (broken) simplification from fold which is already handled in match.
The reason why it was broken is because of the use of wi::to_wide on the RHS of the
rotate which could be 2 different types even though the LHS was the same type.
Since it is already handled in match (by the patterns for
`Turn (a OP c1) OP c2 into a OP (c1+c2).`). It can be removed without losing any optimizations.

Bootstrapped and tested on x86_64-linux-gnu.

PR middle-end/117492

gcc/ChangeLog:

* fold-const.cc (fold_binary_loc): Remove `Two consecutive rotates adding up
to the some integer` simplifcation.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117492-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Andrew Pinski [Fri, 8 Nov 2024 21:39:05 +0000 (13:39 -0800)]

VN: Don't recurse on for the same value of `a | b` [PR117496]

After adding vn_valueize to the handle the `a | b ==/!= 0` case
of insert_predicates_for_cond, it would go into an infinite loop
as the Value number for either a or b could be the same as what it
is for the whole expression. This avoids that recursion so there is
no infinite loop here.

Bootstrapped and tested on x86_64-linux.

PR tree-optimization/117496

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): If the
valueization for the new lhs is the same as the old one,
don't recurse.

gcc/testsuite/ChangeLog:

* gcc.dg/torture/pr117496-1.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Andrew Pinski [Thu, 7 Nov 2024 17:40:15 +0000 (09:40 -0800)]

VN: Canonicalize compares before calling vn_nary_op_lookup_pieces

This is the followup as mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2024-November/667987.html .
We need to canonicalize the compares using tree_swap_operands_p instead
of checking CONSTANT_CLASS_P.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-sccvn.cc (visit_phi): Swap the operands
before calling vn_nary_op_lookup_pieces if
tree_swap_operands_p returns true.
(insert_predicates_for_cond): Use tree_swap_operands_p
instead of checking for CONSTANT_CLASS_P.
(process_bb): Swap the comparison and operands
if tree_swap_operands_p returns true.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Jakub Jelinek [Sat, 9 Nov 2024 16:11:34 +0000 (17:11 +0100)]

ChangeLog: Manually add entries for r15-4998 and r15-5004

These commits used *.c rather than *.cc suffix and miracuously got through
the pre-commit hook but broke ChangeLog generation.

commit | commitdiff | tree

GCC Administrator [Sat, 9 Nov 2024 16:03:14 +0000 (16:03 +0000)]

Daily bump.

commit | commitdiff | tree

Jakub Jelinek [Sat, 9 Nov 2024 15:57:26 +0000 (16:57 +0100)]

contrib: Add 2 further ignored commits

r15-4998 and r15-5004 had wrong commit message, add those to
ignored commits. ChangeLog will need to be added manually.

2024-11-09 Jakub Jelinek <jakub@redhat.com>

* gcc-changelog/git_update_version.py (ignored_commits): Add 2
further commits.

commit | commitdiff | tree

Jakub Jelinek [Sat, 9 Nov 2024 15:45:44 +0000 (16:45 +0100)]

m2: Fix up dependencies some more

Every now and then my x86_64-linux bootstrap fails due to missing
dependencies somewhere in m2, usually during stage3.  I'm using
make -j32 and run 2 bootstraps concurrently (x86_64-linux and i686-linux)
on the same box.

Last night the same happened to me again,
with the first error
In file included from ./tm.h:29,
                 from ../../gcc/backend.h:28,
                 from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:27,
                 from m2/gm2-compiler-boot/M2AsmUtil.c:26:
../../gcc/config/i386/i386.h:2484:10: fatal error: insn-attr-common.h: No such file or directory
2484 | #include "insn-attr-common.h"
      |          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[3]: *** [../../gcc/m2/Make-lang.in:1576: m2/gm2-compiler-boot/M2AsmUtil.o] Error 1
make[3]: *** Waiting for unfinished jobs....

Now, gcc/Makefile.in has a general rule:
# In order for parallel make to really start compiling the expensive
# objects from $(OBJS) as early as possible, build all their
# prerequisites strictly before all objects.
$(ALL_HOST_OBJS) : | $(generated_files)
which ensures that everything that might depend on the generated files
waits for those to be generated.
The above error clearly shows that such waiting didn't happen for
m2/gm2-compiler-boot/M2AsmUtil.o and some others.
ALL_HOST_OBJS includes $(ALL_HOST_FRONTEND_OBJS),
where the latter is
ALL_HOST_FRONTEND_OBJS = $(foreach v,$(CONFIG_LANGUAGES),$($(v)_OBJS))
m2_OBJS already includes various *.o files, for all those we wait until
the generated files are generated.  Though, seems
cc1gm2 depends on m2/stage1/cc1gm2 (which is just copied there),
and that depends on m2/gm2-compiler-boot/m2flex.o,
$(GM2_C_OBJS) and m2/gm2-gcc/rtegraph.o already included in m2_OBJS,
but also on
$(GM2_LIBS_BOOT) $(MC_LIBS)
where
MC_LIBS=m2/mc-boot-ch/Glibc.o m2/mc-boot-ch/Gmcrts.o
GM2_LIBS_BOOT     = m2/gm2-compiler-boot/gm2.a \
                    m2/gm2-libs-boot/libgm2.a \
                    $(GM2-BOOT-O)
GM2-BOOT-O isn't defined, and the 2 libraries depend on
$(BUILD-LIBS-BOOT) $(BUILD-COMPILER-BOOT)

So, the following patch adds those to m2_OBJS.

I'm not sure if something further is needed, like some objects
used to build the helper programs, mc and whatever else is needed,
I guess it depends on if they use or can use say tm.h or similar
headers which depend on the generated headers.

2024-11-09  Jakub Jelinek  <jakub@redhat.com>

gcc/m2/
* Make-lang.in (m2_OBJS): Add $(BUILD-LIBS-BOOT),
$(BUILD-COMPILER-BOOT) and $(MC_LIBS).

commit | commitdiff | tree

Torbjörn SVENSSON [Fri, 1 Nov 2024 16:47:48 +0000 (17:47 +0100)]

arm: Fix ICE on arm_mve.h pragma without MVE types [PR117408]

Starting with r14-435-g00d97bf3b5a, doing `#pragma arm "arm_mve.h"
false` or `#pragma arm "arm_mve.h" true` without first doing
`#pragma arm "arm_mve_types.h"` causes GCC to ICE.

gcc/ChangeLog:

PR target/117408
* config/arm/arm-mve-builtins.cc(handle_arm_mve_h): Detect if MVE
types is missing and if so, return error.

gcc/testsuite/ChangeLog:

PR target/117408
* gcc.target/arm/mve/pr117408-1.c: New test.
* gcc.target/arm/mve/pr117408-2.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

commit | commitdiff | tree

Jakub Jelinek [Sat, 9 Nov 2024 10:42:30 +0000 (11:42 +0100)]

trans-mem: Fix ICE caused by expand_assign_tm

My https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668065.html
patch regressed
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++11 (internal compiler error: in create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++11 (test for excess errors)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++14 (internal compiler error: in create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-3.C  -std=gnu++14 (test for excess errors)
...
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++26 (internal compiler error: in create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++26 (test for excess errors)
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++98 (internal compiler error: in create_tmp_var, at gimple-expr.cc:484)
+FAIL: g++.dg/tm/pr45940-4.C  -std=gnu++98 (test for excess errors)
tests, but it turns out it is a preexisting bug.
If I modify the pr45940-3.C testcase
--- gcc/testsuite/g++.dg/tm/pr45940-3.C 2020-01-12 11:54:37.258400660 +0100
+++ gcc/testsuite/g++.dg/tm/pr45940-3.C 2024-11-08 10:35:11.918390743 +0100
@@ -16,6 +16,7 @@ class sp_counted_base
{
protected:
     int use_count_;        // #shared
+    int a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, aa, ab, ac, ad, ae, af;
public:
     __attribute__((transaction_safe))
     virtual void dispose() = 0; // nothrow
then it ICEs already on vanilla trunk.

The problem is that expand_assign_tm just wants to force it into
TM memcpy argument, if is_gimple_reg (reg), then it creates a temporary,
stores the value there and takes temporary address, otherwise it takes
address of rhs.  That doesn't work if rhs is an empty CONSTRUCTOR with
C++ non-POD type (TREE_ADDRESSABLE type), we ICE trying to create temporary,
because we shouldn't be creating a temporary.
Now before my patch with the CONSTRUCTOR only having a vtable pointer
(64bit) and 32-bit field, we gimplified the zero initialization just
as storing of 0s to the 2 fields, but as we are supposed to also clear
padding bits, we now gimplify it as MEM[...] = {}; to make sure
even the padding bits are cleared.  With the adjusted testcase,
we gimplified it even before as MEM[...] = {}; because it was simply
too large and clearing everything looked beneficial.

The following patch fixes this ICE by using TM memset, it is both
wasteful to force zero constructor into a temporary just to TM memcpy
it into the lhs, and in C++ cases like this invalid.

2024-11-09  Jakub Jelinek  <jakub@redhat.com>

* trans-mem.cc (expand_assign_tm): Don't take address
of empty CONSTRUCTOR, instead use BUILT_IN_TM_MEMSET
to clear lhs in that case.  Formatting fixes.

commit | commitdiff | tree

Martin Uecker [Fri, 1 Nov 2024 09:15:44 +0000 (10:15 +0100)]

c: minor fixes related to arrays of unspecified size

The patch for PR117145 and PR117245 also fixed PR100420 and PR116284 which
are bugs related to arrays of unspecified size. Those are now represented
as variable size arrays with size (0, 0). There are still some loose ends,
which are resolved here by

1. adding a testcase for PR116284,
2. moving code related to creation and detection of arrays of unspecified
sizes in their own functions,
3. preferring a specified size over an unspecified size when forming
a composite type as required by C99 (PR118391)
4. removing useless code in comptypes_internal and composite_type_internal.

PR c/116284
PR c/117391

gcc/c/ChangeLog:
* c-tree.h (c_type_unspecified_p): New inline function.
* c-typeck.cc (c_build_array_type_unspecified): New function.
(comptypes_interal): Remove useless code.
(composite_type_internal): Update.
* c-decl.cc (grokdeclarator): Revise.

gcc/testsuite/ChangeLog:
* gcc.dg/pr116284.c: New test.
* gcc.dg/pr117391.c: New test.

commit | commitdiff | tree

Andi Kleen [Thu, 31 Oct 2024 23:31:02 +0000 (16:31 -0700)]

Update gcc-auto-profile / gen_autofdo_event.py

- Fix warnings with newer python versions about bad escapes by
making all the python string raw.
- Add a fallback for using the builtin perf event list if the
CPU model number is unknown.
- Regenerate the shipped gcc-auto-profile with the changes.

contrib/ChangeLog:

* gen_autofdo_event.py: Convert strings to raw.
Add fallback to using builtin perf event list.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: Regenerate.

commit | commitdiff | tree

Marek Polacek [Thu, 31 Oct 2024 13:28:15 +0000 (09:28 -0400)]

c: Implement C2y N3356, if declarations [PR117019]

This patch implements C2y N3356, if declarations as described at
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3356.htm>.

This feature is cognate with C++17 Selection statements with initializer
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0305r1.html>,
but they are not the same yet.  For example, C++17 allows

  if (lock (); int i = getval ())

whereas C2y does not.

The proposal adds new grammar productions.  selection-header is handled
in c_parser_selection_header which is the gist of the patch.
simple-declaration is handled by c_parser_declaration_or_fndef, which
gets a new parameter.

PR c/117019

gcc/c/ChangeLog:

* c-parser.cc (c_parser_declaration_or_fndef): Adjust declaration.
(c_parser_external_declaration): Adjust a call to
c_parser_declaration_or_fndef.
(c_parser_declaration_or_fndef): New bool parameter.  Return a tree
instead of void.  Adjust for N3356.  Adjust a call to
c_parser_declaration_or_fndef.
(c_parser_compound_statement_nostart): Adjust calls to
c_parser_declaration_or_fndef.
(c_parser_selection_header): New.
(c_parser_paren_selection_header): New.
(c_parser_if_statement): Call c_parser_paren_selection_header
instead of c_parser_paren_condition.
(c_parser_switch_statement): Call c_parser_selection_header instead of
c_parser_expression.
(c_parser_for_statement): Adjust calls to c_parser_declaration_or_fndef.
(c_parser_objc_methodprotolist): Likewise.
(c_parser_oacc_routine): Likewise.
(c_parser_omp_loop_nest): Likewise.
(c_parser_omp_declare_simd): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/c23-if-decls-1.c: New test.
* gcc.dg/c23-if-decls-2.c: New test.
* gcc.dg/c2y-if-decls-1.c: New test.
* gcc.dg/c2y-if-decls-2.c: New test.
* gcc.dg/c2y-if-decls-3.c: New test.
* gcc.dg/c2y-if-decls-4.c: New test.
* gcc.dg/c2y-if-decls-5.c: New test.
* gcc.dg/c2y-if-decls-6.c: New test.
* gcc.dg/c2y-if-decls-7.c: New test.
* gcc.dg/c2y-if-decls-8.c: New test.
* gcc.dg/c2y-if-decls-9.c: New test.
* gcc.dg/c2y-if-decls-10.c: New test.
* gcc.dg/c2y-if-decls-11.c: New test.
* gcc.dg/gnu2y-if-decls-1.c: New test.
* gcc.dg/gnu99-if-decls-1.c: New test.
* gcc.dg/gnu99-if-decls-2.c: New test.

commit | commitdiff | tree

John David Anglin [Fri, 8 Nov 2024 21:58:49 +0000 (16:58 -0500)]

hppa: Don't allow mode size 32 in hard registers

LRA has problems handling spills for OI mode. There are issues with
SUBREG support as well.

2024-11-08 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/117238
* config/pa/pa64-regs.h (PA_HARD_REGNO_MODE_OK): Don't allow
mode size 32.

commit | commitdiff | tree

John David Anglin [Fri, 8 Nov 2024 21:54:48 +0000 (16:54 -0500)]

hppa: Don't use '%' operator in base14_operand

Division is slow on hppa and mode sizes are powers of 2. So, we
can use '&' operator to check displacement alignment.

2024-11-08 John David Anglin <danglin@gcc.gnu.org>

gcc/ChangeLog:

* config/pa/predicates.md (base14_operand): Use '&' operator
instead of '%' to check displacement alignment.

commit | commitdiff | tree

John David Anglin [Fri, 8 Nov 2024 21:49:34 +0000 (16:49 -0500)]

hppa: Don't allow large modes in hard registers

LRA has problems handling spills for OI and TI modes.  There are
issues with SUBREG support as well.

This change fixes gcc.c-torture/compile/pr92618.c with LRA.

2024-11-08  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/117238
* config/pa/pa32-regs.h (PA_HARD_REGNO_MODE_OK): Don't allow
mode size 32.  Limit mode size 16 in general registers to
complex modes.

commit | commitdiff | tree

John David Anglin [Fri, 8 Nov 2024 21:34:41 +0000 (16:34 -0500)]

hppa: Fix handling of secondary reloads involving a SUBREG

This is fairly subtle.

When handling spills for SUBREG arguments in pa_emit_move_sequence,
alter_subreg may be called.  It in turn calls adjust_address_1 and
change_address_1.  change_address_1 calls pa_legitimate_address_p
to validate the new spill address.  change_address_1 generates an
internal compiler error if the address is not valid.  We need to
allow 14-bit displacements for all modes when reload_in_progress
is true and strict is false to prevent the internal compiler error.

SUBREGs are only used with the general registers, so the spill
should result in an integer access.  14-bit displacements are okay
for integer loads and stores but not for floating-point loads and
stores.

Potentially, the change could break the handling of spills for the
floating point-registers but I believe these are handled separately
in pa_emit_move_sequence.

This change fixes the build of symmetrica-3.0.1+ds.

2024-11-08  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/117443
* config/pa/pa.cc (pa_legitimate_address_p): Allow any
14-bit displacement when reload is in progress and strict
is false.

commit | commitdiff | tree

Jakub Jelinek [Fri, 8 Nov 2024 21:07:33 +0000 (22:07 +0100)]

ibstdc++: Add some further attributes to ::operator new in <new>

I've noticed alloc_align attribute is missing on the non-vector
::operator new with std::align_val_t and const std::nothrow_t&
arguments, this patch adds it.  The last hunk is just
an attempt to make the line shorter.
The first hunk originally added also __alloc_size__ (1) attribute,
but seems that regresses
FAIL: g++.dg/tm/pr46270.C  -std=gnu++98 (test for excess errors)
with
Excess errors:
.../libstdc++-v3/libsupc++/new:137:26: warning: new declaration 'void* operator new(std::size_t)' ambiguates built-in declaration 'void* operator new(long unsigned int)
+transaction_safe' [-Wbuiltin-declaration-mismatch]
.../libstdc++-v3/libsupc++/new:140:26: warning: new declaration 'void* operator new [](std::size_t)' ambiguates built-in declaration 'void* operator new [](long unsigned int)
+transaction_safe' [-Wbuiltin-declaration-mismatch]
I must say I have no clue why that happens only in C++98 (C++11 and
above are quiet) and why only with -fgnu-tm, tried to debug that but
am lost.  It is some conflict with the predeclared ::operator new, but
those clearly do have the externally_visible attribute, and alloc_size (1)
attributes:
     extvisattr = build_tree_list (get_identifier ("externally_visible"),
                                   NULL_TREE);
     newattrs = tree_cons (get_identifier ("alloc_size"),
                           build_tree_list (NULL_TREE, integer_one_node),
                           extvisattr);
     newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs);
     newtype = build_exception_variant (newtype, new_eh_spec);
...
    tree opnew = push_cp_library_fn (NEW_EXPR, newtype, 0);
    DECL_IS_MALLOC (opnew) = 1;
    DECL_SET_IS_OPERATOR_NEW (opnew, true);
    DECL_IS_REPLACEABLE_OPERATOR (opnew) = 1;
and at C++98 I think libstdc++ doesn't add transaction_safe attribute:
// Conditionally enable annotations for the Transactional Memory TS on C++11.
// Most of the following conditions are due to limitations in the current
// implementation.
#if __cplusplus >= 201103L && _GLIBCXX_USE_CXX11_ABI                    \
   && _GLIBCXX_USE_DUAL_ABI && __cpp_transactional_memory >= 201500L     \
   &&  !_GLIBCXX_FULLY_DYNAMIC_STRING && _GLIBCXX_USE_WEAK_REF           \
   && _GLIBCXX_USE_ALLOCATOR_NEW
#define _GLIBCXX_TXN_SAFE transaction_safe
#define _GLIBCXX_TXN_SAFE_DYN transaction_safe_dynamic
#else
#define _GLIBCXX_TXN_SAFE
#define _GLIBCXX_TXN_SAFE_DYN
#endif
push_cp_library_fn adds transaction_safe attribute whenever -fgnu-tm
is used, regardless of the other conditionals:
   if (flag_tm)
     apply_tm_attr (fn, get_identifier ("transaction_safe"));

Anyway, omitting alloc_size (1) fixes that test and given that the
predeclared operator new already has alloc_size (1) attribute, I think it
can be safely left out.

2024-11-08  Jakub Jelinek  <jakub@redhat.com>

* libsupc++/new (::operator new, ::operator new[]): Add malloc
attribute where missing.  Add alloc_align attribute when
std::align_val_t is present and where it was missing.  Formatting fix.

commit | commitdiff | tree

Jonathan Wakely [Fri, 1 Nov 2024 14:26:38 +0000 (14:26 +0000)]

libstdc++: Make some _Hashtable members inline

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Add 'inline' to some
one-line constructors.

Reviewed-by: François Dumont <fdumont@gcc.gnu.org>

commit | commitdiff | tree

Jonathan Wakely [Fri, 8 Nov 2024 13:58:23 +0000 (13:58 +0000)]

libstdc++: Do not define _Insert_base::try_emplace before C++17

This is not a reserved name in C++11 and C++14, so must not be defined.

Also use the appropriate feature test macros for the try_emplace members
of the Debug Mode maps.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Insert_base::try_emplace):
Do not define for C++11 and C++14.
* include/debug/map.h (try_emplace): Use feature test macro.
* include/debug/unordered_map (try_emplace): Likewise.
* testsuite/17_intro/names.cc: Define try_emplace before C++17.

commit | commitdiff | tree

Richard Biener [Fri, 8 Nov 2024 14:11:34 +0000 (15:11 +0100)]

Fix gcc.dg/vect/bb-slp-77.c for x86

x86 doesn't have .REDUC_PLUS for V2SImode - there's no effective
target for that so add it to the list of targets not expecting the
BB vectorization.

* gcc.dg/vect/bb-slp-77.c: Add x86_64-*-* and i?86-*-* to
the list of expected failing targets.

commit | commitdiff | tree

Andre Simoes Dias Vieira [Fri, 8 Nov 2024 13:34:57 +0000 (13:34 +0000)]

arm: Improvements to arm_noce_conversion_profitable_p call [PR 116444]

When not dealing with the special armv8.1-m.main conditional instructions case
make sure it uses the default_noce_conversion_profitable_p call to determine
whether the sequence is cost effective.

Also make sure arm_noce_conversion_profitable_p accepts vsel<cond> patterns for
Armv8.1-M Mainline targets.

gcc/ChangeLog:

PR target/116444
* config/arm/arm.cc (arm_noce_conversion_profitable_p): Call
default_noce_conversion_profitable_p when not dealing with the
armv8.1-m.main special case.
(arm_is_vsel_fp_insn): New function.

commit | commitdiff | tree

Jakub Jelinek [Fri, 8 Nov 2024 12:36:05 +0000 (13:36 +0100)]

c++: Fix ICE on constexpr virtual function [PR117317]

Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.

The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.

2024-11-08 Jakub Jelinek <jakub@redhat.com>

PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.

* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.

commit | commitdiff | tree

Torbjörn SVENSSON [Thu, 7 Nov 2024 19:09:48 +0000 (20:09 +0100)]

testsuite: arm: Use check-function-bodies in epilog-1.c test

Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

commit | commitdiff | tree

Torbjörn SVENSSON [Wed, 6 Nov 2024 06:12:14 +0000 (07:12 +0100)]

testsuite: arm: Use effective-target arm_libc_fp_abi for pr68620.c test

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1407.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68620.c: Use effective-target
arm_libc_fp_abi.
* lib/target-supports.exp: Define effective-target
arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-authored-by: Richard Earnshaw <rearnsha@arm.com>

commit | commitdiff | tree

Torbjörn SVENSSON [Thu, 7 Nov 2024 17:05:19 +0000 (18:05 +0100)]

testsuite: arm: Allow vst1.32 instruction in pr40457-2.c

When building the test case with neon, the 'vst1.32' instruction is used
instead of 'strd'. Allow both variants to make the test pass.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed
instruction.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

commit | commitdiff | tree

Torbjörn SVENSSON [Wed, 6 Nov 2024 09:28:34 +0000 (10:28 +0100)]

testsuite: arm: Use effective-target for pr84556.cc test

Using "dg-do run" with a selector overrides the default selector set by
vect.exp that picks between "dg-do run" and "dg-do compile" based on the
target's support for simd operations for Arm targets.
The actual selection of default operation is performed in
check_vect_support_and_set_flags.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

commit | commitdiff | tree

Richard Biener [Fri, 8 Nov 2024 11:44:47 +0000 (12:44 +0100)]

Enable gcc.dg/vect/vect-early-break_21.c on x86_64

The following also enables the testcase on x86 as it now has the
required cbranch.

* gcc.dg/vect/vect-early-break_21.c: Remove disabling of
x86_64 and i?86.

commit | commitdiff | tree

Jonathan Wakely [Fri, 1 Nov 2024 12:38:29 +0000 (12:38 +0000)]

libstdc++: Simplify __detail::__distance_fw using 'if constexpr'

This uses 'if constexpr' instead of tag dispatching, removing the need
for a second call using that tag, and simplifying the overload set that
needs to be resolved for calls to __distance_fw.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (__distance_fw): Replace tag
dispatching with 'if constexpr'.

commit | commitdiff | tree

Victor Do Nascimento [Fri, 8 Nov 2024 11:09:54 +0000 (11:09 +0000)]

aarch64: Extend support for the AE family of Cortex CPUs

Implement -mcpu options for:

  - Cortex-A520AE
  - Cortex-A720AE
  - Cortex-R82AE

These all implement the same feature sets as their non-AE
counterparts, using the same scheduler and costs and differing only in
their respective part numbers.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (cortex-a520ae,
cortex-a720ae, cortex-r82ae): Define new entries.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document A520AE, A720AE and R82AE CPUs.

commit | commitdiff | tree

Torbjörn SVENSSON [Thu, 31 Oct 2024 18:11:57 +0000 (19:11 +0100)]

testsuite: arm: Use effective-target for nomve_fp_1 test

Test uses MVE, so add effective-target arm_fp requirement.

gcc/testsuite/ChangeLog:

* g++.target/arm/mve/general-c++/nomve_fp_1.c: Use
effective-target arm_fp.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

commit | commitdiff | tree

xuli [Wed, 6 Nov 2024 06:10:09 +0000 (06:10 +0000)]

RISC-V: Add testcases for unsigned imm vec SAT_SUB form1

form1:
void __attribute__((noinline))             \
vec_sat_u_sub_imm##IMM##_##T##_fmt_1 (T *out, T *in, unsigned limit)  \
{                                                   \
  unsigned i;                                       \
  for (i = 0; i < limit; i++)                       \
    out[i] = (T)IMM >= in[i] ? (T)IMM - in[i] : 0;  \
}

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: add data for vec sat_sub.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: add unsigned imm vec sat_sub form1.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-4.c: New test.

commit | commitdiff | tree

Jonathan Wakely [Thu, 7 Nov 2024 16:51:58 +0000 (16:51 +0000)]

libstdc++: Improve comment for _Hashtable::_M_insert_unique_node

Clarify the effects if rehashing is needed. Document the __n_elt
parameter.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_M_insert_unique_node): Improve
comment.

commit | commitdiff | tree

Jonathan Wakely [Tue, 5 Nov 2024 17:19:06 +0000 (17:19 +0000)]

libstdc++: Fix conversions to key/value types for hash table insertion [PR115285]

The conversions to key_type and value_type that are performed when
inserting into _Hashtable need to be fixed to do any required
conversions explicitly. The current code assumes that conversions from
the parameter to the key_type or value_type can be done implicitly,
which isn't necessarily true.

Remove the _S_forward_key function which doesn't handle all cases and
either forward the parameter if it already has type cv key_type, or
explicitly construct a temporary of type key_type.

Similarly, the _ConvertToValueType specialization for maps doesn't
handle all cases either, for std::pair arguments only some value
categories are handled. Remove _ConvertToValueType and for the _M_insert
function for unique keys, either forward the argument unchanged or
explicitly construct a temporary of type value_type.

For the _M_insert overload for non-unique keys we don't need any
conversion at all, we can just forward the argument directly to where we
construct a node.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_S_forward_key): Remove.
(_Hashtable::_M_insert_unique_aux): Replace _S_forward_key with
a static_cast to a type defined using conditional_t.
(_Hashtable::_M_insert): Replace _ConvertToValueType with a
static_cast to a type defined using conditional_t.
* include/bits/hashtable_policy.h (_ConvertToValueType): Remove.
* testsuite/23_containers/unordered_map/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/96088.cc: Adjust
expected number of allocations.

commit | commitdiff | tree

Jonathan Wakely [Fri, 1 Nov 2024 10:09:55 +0000 (10:09 +0000)]

libstdc++: Define __is_pair variable template for C++11

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (__is_pair): Define for C++11 and
C++14 as well.

commit | commitdiff | tree

Jonathan Wakely [Thu, 7 Nov 2024 21:57:52 +0000 (21:57 +0000)]

libstdc++: Fix grammar in comment, again

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Fix comment grammar.

commit | commitdiff | tree

Richard Sandiford [Thu, 7 Nov 2024 20:34:50 +0000 (20:34 +0000)]

aarch64: Fix gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c

I missed a search-and-replace on this test, meaning that it was
duplicating bfmlalb_f32.c.

gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla*
with bfmls*

commit | commitdiff | tree

Richard Sandiford [Thu, 7 Nov 2024 20:34:49 +0000 (20:34 +0000)]

aarch64: Make PSEL dependent on SME rather than SME2

The svpsel_lane intrinsics were wrongly classified as SME2+ only,
rather than as base SME intrinsics. They should always be available
in streaming mode.

gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>)
(*aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING
rather than TARGET_STREAMING_SME2.

gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.

commit | commitdiff | tree

Richard Sandiford [Thu, 7 Nov 2024 20:34:48 +0000 (20:34 +0000)]

aarch64: Restrict FCLAMP to SME2

There are two sets of patterns for FCLAMP: one set for single registers
and one set for multiple registers. The multiple-register set was
correctly gated on SME2, but the single-register set only required SME.
This doesn't matter for ACLE usage, since the intrinsic definitions
are correctly gated. But it does matter for automatic generation of
FCLAMP from separate minimum and maximum operations (either ACLE
intrinsics or autovectorised code).

gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>)
(*aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2
rather than TARGET_STREAMING_SME.

gcc/testsuite/
* gcc.target/aarch64/sme/clamp_3.c: Force sme2
* gcc.target/aarch64/sme/clamp_4.c: Likewise.
* gcc.target/aarch64/sme/clamp_5.c: New test.

commit | commitdiff | tree

David Faust [Thu, 7 Nov 2024 17:27:07 +0000 (09:27 -0800)]

bpf: avoid possible null deref in btf_ext_output [PR target/117447]

The BPF-specific .BTF.ext section is always generated for BPF programs
if -gbtf is specified, and generating it requires BTF information and
assumes that the BTF info has already been generated.

Compiling non-C languages to BPF is not supported, nor is generating
CTF/BTF for non-C. But, compiling another language like C++ to BPF
with -gbtf specified meant that we would try to generate the .BTF.ext
section anyway, and then ICE because no BTF information was available.

Add a check to bail out of btf_ext_output if the TU CTFC does not exist,
meaning no BTF info is available.

gcc/
PR target/117447
* config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.

commit | commitdiff | tree

David Faust [Thu, 7 Nov 2024 17:19:51 +0000 (09:19 -0800)]

btf: check hash maps are non-null before emptying

These maps will always be non-null in btf_finalize under normal
circumstances, but be safe and verify that before trying to empty them.

gcc/
* btfout.cc (btf_finalize): Check that hash maps are non-null before
emptying them.

commit | commitdiff | tree

Andrew Pinski [Mon, 28 Oct 2024 23:40:34 +0000 (16:40 -0700)]

ifcombine: For short circuit case, allow 2 convert defining statements [PR85605]

r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs
into using either AND or OR. But it only allowed the inner condition
basic block having the conditional only. This changes to allow up to 2 defining
statements as long as they are just integer to integer conversions for
either the lhs or rhs of the conditional.

This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before.

Boootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/85605

gcc/ChangeLog:

* tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function.
(ifcombine_ifandif): Use can_combine_bbs_with_short_circuit
instead of checking if iterator is one before the last statement.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Andrew Pinski [Sat, 2 Nov 2024 06:20:22 +0000 (23:20 -0700)]

VN: Lookup `val != 0` if we got back val when looking up the predicate for GIMPLE_COND [PR117414]

Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND
rather than a predicate. We then want to lookup the `val != 0` for the predicate.

Note this might happen with other boolean assignments and COND_EXPR but I am not sure
if it is as important; I have not found a testcase yet.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117414

gcc/ChangeLog:

* tree-ssa-sccvn.cc (process_bb): Lookup
`val != 0` if got back a ssa name when looking the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fre-predicated-4.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Andrew Pinski [Sat, 2 Nov 2024 06:12:52 +0000 (23:12 -0700)]

VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

After the last patch, we also want to record `(A CMP B) != 0`
as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the
true/false edges swapped.

This shows up more due to the new handling of
`(A | B) ==/!= 0` in insert_predicates_for_cond
as now we can notice these comparisons which were not seen before.

This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c`
and make sure we don't regress it when enhancing ifcombine.

This adds that predicate and allows us to optimize f
in fre-predicated-3.c.

Changes since v1:
* v2: Use vn_valueize.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117414

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fre-predicated-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Andrew Pinski [Sat, 2 Nov 2024 03:06:30 +0000 (20:06 -0700)]

VN: Handle `(a | b) !=/== 0` for predicates [PR117414]

For `(a | b) == 0`, we can "assert" on the true edge that
both `a == 0` and `b == 0` but nothing on the false edge.
For `(a | b) != 0`, we can "assert" on the false edge that
both `a == 0` and `b == 0` but nothing on the true edge.
This adds that predicate and allows us to optimize f0, f1,
and f2 in fre-predicated-[12].c.

Changes since v1:
* v2: Use vn_valueize. Also canonicalize the comparison
      at the begining of insert_predicates_for_cond for
      constants to be on the rhs. Return early for
      non-ssa names on the lhs (after canonicalization).

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117414

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison.
Don't insert anything if lhs is not a SSA_NAME. Handle `(a | b) !=/== 0`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fre-predicated-1.c: New test.
* gcc.dg/tree-ssa/fre-predicated-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Andrew Pinski [Sat, 2 Nov 2024 02:28:19 +0000 (19:28 -0700)]

VN: Factor out inserting predicates for conditional

To make it easier to add more predicates in some cases,
factor out the code. Plus it makes the code slightly more
readable since it is not indented as much.

Bootstrapped and tested on x86_64.

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ...
(process_bb): Here.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

commit | commitdiff | tree

Jonathan Wakely [Thu, 7 Nov 2024 11:14:19 +0000 (11:14 +0000)]

libstdc++: Tweak comments on includes in hashtable headers

std::is_permutation is only used in <bits/hashtable.h> not in
<bits/hashtable_policy.h>, so move the comment referring to it.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Add is_permutation to comment.
* include/bits/hashtable_policy.h: Remove it from comment.

commit | commitdiff | tree

Jonathan Wakely [Tue, 5 Nov 2024 23:55:08 +0000 (23:55 +0000)]

libstdc++: Fix typo in comment in hashtable.h

And tweak grammar in a couple of comments.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Fix spelling in comment.

commit | commitdiff | tree

Tobias Burnus [Thu, 7 Nov 2024 15:13:06 +0000 (16:13 +0100)]

libgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

* libgomp.texi (OpenMP Technical Report 13): Remove 'iterator'
in 'map' clause of 'declare mapper' as it is already the list above.
(Interoperability Routines): Add.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Document that depobj_list may be omitted in C++ and Fortran.

commit | commitdiff | tree

Paul Iannetta [Wed, 30 Oct 2024 10:21:09 +0000 (11:21 +0100)]

Unify registered_pp_pragmas and registered_pragmas

Until now, the structures that keep pragma information were different
when in preprocessing only mode and in normal mode. This change unifies
both so that the space and name of a pragma are always registered and
can be queried easily at a later time.

gcc/c-family/ChangeLog:

* c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler);
(c_register_pragma_1): Always register name and space for all pragmas.
(c_invoke_pragma_handler): Adapt.
(c_invoke_early_pragma_handler): Likewise.
(c_pp_invoke_early_pragma_handler): Likewise.

commit | commitdiff | tree

Richard Biener [Tue, 5 Nov 2024 13:58:59 +0000 (14:58 +0100)]

Disable gather/scatter for non-first vectorized epilogue

We currently make vect_check_gather_scatter happy by replacing SSA
name references in DR_REF for gather/scatter DRs but the replacement
process only works once since for the second epilogue we have SSA
names from the first epilogue in DR_REF but as we copied from the
original loop the SSA mapping doesn't work.

The following simply punts for non-first epilogues, those gather/scatter
recognized by patterns to IFNs are already analyzed and should work
fine.

* tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse
to analyze DR_REF if from an epilogue that's not first.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment
how the substitution in DR_REF is broken.

commit | commitdiff | tree

Richard Biener [Mon, 4 Nov 2024 11:58:41 +0000 (12:58 +0100)]

Add LOOP_VINFO_MAIN_LOOP_INFO

The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
vectorized loop info and the preceeding vectorized epilogue.
This is critical for correctness as we need to disallow never
executed epilogues by costing in vect_analyze_loop_costing as
we assume those do not exist when deciding to add a skip-vector
edge during peeling.  The patch also changes how multiple vector
epilogues are handled - instead of the epilogue_vinfos array in
the main loop info we now record the single epilogue_vinfo there
and further epilogues in the epilogue_vinfo member of the
epilogue info.  This simplifies code.

* tree-vectorizer.h (_loop_vec_info::main_loop_info): New.
(LOOP_VINFO_MAIN_LOOP_INFO): Likewise.
(_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos
from array to single element.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
main_loop_info and epilogue_vinfo.  Remove epilogue_vinfos
allocation.
(_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos.
(vect_create_loop_vinfo): Rename parameter, set
LOOP_VINFO_MAIN_LOOP_INFO.
(vect_analyze_loop_1): Rename parameter.
(vect_analyze_loop_costing): Properly distinguish between
the main vector loop and the preceeding epilogue.
(vect_analyze_loop): Change for epilogue_vinfos no longer
being a vector.
* tree-vect-loop-manip.cc (vect_do_peeling): Simplify and
thereby handle a vector epilogue of a vector epilogue.

commit | commitdiff | tree

Richard Biener [Mon, 4 Nov 2024 12:09:21 +0000 (13:09 +0100)]

Add LOOP_VINFO_DRS_ADVANCED_BY

The following remembers how we advanced DRs when vectorizing an
epilogue. When we want to vectorize the epilogue of such epilogue
we have to retain that advancement and add the advancement for this
vectorized epilogue. Due to the way we copy and re-associate
stmt_vec_infos and DRs recording this advancement and re-applying
it for the next epilogue is simplest.

* tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New.
(LOOP_VINFO_DRS_ADVANCED_BY): Likewise.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
drs_advanced_by.
(update_epilogue_loop_vinfo): Remember the DR advancement made.
(vect_transform_loop): Accumulate past advancements.

commit | commitdiff | tree

Richard Biener [Mon, 4 Nov 2024 12:03:33 +0000 (13:03 +0100)]

Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supported

We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS
in case the main loop didn't (the other way around is OK), the
computation whether the epilog is executed or not gets our of sync
otherwise.

* tree-vect-loop.cc (vect_analyze_loop_2): Move
vect_analyze_loop_costing after check whether we can do
peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for
epilogues.

commit | commitdiff | tree

Jakub Jelinek [Thu, 7 Nov 2024 12:20:20 +0000 (13:20 +0100)]

testsuite: Fix up pr116725.c test [PR116725]

On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote:
>             PR target/116725
>             * gcc.target/i386/pr116725.c: Add test using those AVX builtins.

This test FAILs for me, as I don't have the latest gas around and the test
is dg-do assemble, so doesn't need just fixed compiler, but also assembler
which supports those instructions.

The following patch adds effective target directives to ensure assembler
supports those too.

2024-11-07  Jakub Jelinek  <jakub@redhat.com>

PR target/116725
* gcc.target/i386/pr116725.c: Add dg-require-effective-target
avx512{dq,fp16,vl}.

commit | commitdiff | tree

Andrew Stubbs [Thu, 7 Nov 2024 11:23:41 +0000 (11:23 +0000)]

openmp: Fix max_vf testcases with -march=cascadelake

Apparently we need to explicitly disable AVX, not just enabled SSE, to
guarentee the 16-lane vectors we need for the pattern match.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: Add -mno-avx.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: Add -mno-avx.

commit | commitdiff | tree

Pan Li [Tue, 29 Oct 2024 14:37:07 +0000 (22:37 +0800)]

Doc: Add doc for standard name mask_len_strided_load{store}m

This patch would like to add doc for the below 2 standard names.

1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)

gcc/ChangeLog:

* doc/md.texi: Add doc for mask_len_stried_load{store}.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

commit | commitdiff | tree

Richard Biener [Thu, 7 Nov 2024 08:23:03 +0000 (09:23 +0100)]

rtl-optimization/117467 - 33% compile-time in rest of compilation

ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time.
The following adds a timevar to it for proper blaming.

PR rtl-optimization/117467
* timevar.def (TV_EXT_DCE): New.
* ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.

commit | commitdiff | tree

Hongyu Wang [Tue, 5 Nov 2024 09:19:34 +0000 (17:19 +0800)]

i386: Support cstorebf4 with native bf16 comi

We recently supports cbranchbf4 with AVX10_2 native bf16 comi
instructions, so do similar to cstorebf4.

gcc/ChangeLog:

* config/i386/i386.md (cstorebf4): Use vcomsbf16 under
TARGET_AVX10_2_256 and -fno-trapping-math.
(cbranchbf4): Adjust formatting.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-comibf-3.c: New test.
* gcc.target/i386/avx10_2-comibf-4.c: Likewise.

commit | commitdiff | tree

Hu, Lin1 [Thu, 7 Nov 2024 02:13:15 +0000 (10:13 +0800)]

i386: Modify regexp of pr117304-1.c

Since the test doesn't care if the hint is correct,
modify the regexp of the hint part to avoid future
changes to the hint that would cause the test to fail.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117304-1.c: Modify regexp.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:50 +0000 (02:47 -0300)]

limit ifcombine stmt moving and adjust flow info

It became apparent that conditions could be combined that had deep SSA
dependency trees, that might thus require moving lots of statements.
Set a hard upper bound for now, hopefully to be replaced by a
dynamically computed bound, based on probabilities and costs.

Also reset flow sensitive info and avoid introducing undefined
behavior when moving stmts from under guarding conditions.

Finally, rework the preexisting reset of flow sensitive info and
avoidance of undefined behavior to be done when needed on all affected
inner blocks: reset flow info whenever enclosing conditions change,
and avoid undefined behavior whenever enclosing conditions become
laxer.

for  gcc/ChangeLog

* tree-ssa-ifcombine.cc
(ifcombine_rewrite_to_defined_overflow): New.
(ifcombine_replace_cond): Reject conds that would require
moving too many stmts.  Reset flow sensitive info and avoid
undefined behavior in moved stmts.  Reset flow sensitive info
in all inner blocks when the outer condition changes, and
avoid undefined behavior whenever the outer condition becomes
laxer, adapted and moved from...
(pass_tree_ifcombine::execute): ... here.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:46 +0000 (02:47 -0300)]

handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond

The upcoming move of fold_truth_andor to ifcombine brings with it the
possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the
cond so as to best use both BB insertion points, but only if they're
contiguous.

for gcc/ChangeLog

* tree-ssa-ifcombine.c (ifcombine_replace_cond): Support
TRUTH_ANDIF cond exprs.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:42 +0000 (02:47 -0300)]

ifcombine across noncontiguous blocks

Rework ifcombine to support merging conditions from noncontiguous
blocks.  This depends on earlier preparation changes.

The function that attempted to ifcombine a block with its immediate
predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks
eligible for ifcombine, attempting to combine with them.

The function that actually drives the combination of a pair of blocks,
tree_ssa_ifcombine_bb_1, now takes an additional parameter: the
successor of outer that leads to inner.

The function that recognizes if_then_else patterns is modified to
enable testing without distinguishing between then and else, or to
require nondegenerate conditions, that aren't worth combining with.

for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (recognize_if_then_else): Support
relaxed then/else testing; require nondegenerate condition
otherwise.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it
instead of inner_cond_bb.  Adjust callers.
(tree_ssa_ifcombine_bb): Loop over dominating outer blocks
eligible for ifcombine.
(pass_tree_ifcombine::execute): Noted potential need for
changes to the post-combine logic.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:38 +0000 (02:47 -0300)]

extend ifcombine_replace_cond to handle noncontiguous ifcombine

Prepare to handle noncontiguous ifcombine, introducing logic to modify
the outer condition when needed.  There are two cases worth
mentioning:

- when blocks are noncontiguous, we have to place the combined
  condition in the outer block to avoid pessimizing carefully crafted
  short-circuited tests;

- even when blocks are contiguous, we prepare for situations in which
  the combined condition has two tests, one to be placed in outer and
  the other in inner.  This circumstance will not come up when
  noncontiguous ifcombine is first enabled, but it will when
  an improved fold_truth_andor is integrated with ifcombine.

Combining the condition from inner into outer may require moving SSA
DEFs used in the inner condition, and the changes implement this as
well.

for  gcc/ChangeLog

* tree-ssa-ifcombine.cc: Include bitmap.h.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Prepare to handle noncontiguous and
split-condition ifcombine.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:34 +0000 (02:47 -0300)]

adjust update_profile_after_ifcombine for noncontiguous ifcombine

Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.

for gcc/ChangeLog

* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:31 +0000 (02:47 -0300)]

introduce ifcombine_replace_cond

Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.

for gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this. Leave it for the above to
gimplify and invert the condition.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:19 +0000 (02:47 -0300)]

drop redundant ifcombine_ifandif parm

In preparation to changes that may modify both inner and outer
conditions in ifcombine, drop the redundant parameter result_inv, that
is always identical to inner_inv.

for gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
result_inv parm. Adjust all callers.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:15 +0000 (02:47 -0300)]

allow vuses in ifcombine blocks

Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine. That
tree-level folder has long ifcombined loads, absent other relevant
side effects.

for gcc/ChangeLog

* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:47:06 +0000 (02:47 -0300)]

[testsuite] disable PIE on ia32 on more tests

Multiple tests fail on ia32 with -fPIE enabled by default because of
different call sequences required by the call-saved PIC register
(no-callee-saved-*.c), uses of the constant pool instead of computing
constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk
(sse2-stv-1.c). Disable PIE on them, to match the expectations.

for gcc/testsuite/ChangeLog

* gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/sse2-stv-1.c: Likewise.

commit | commitdiff | tree

Alexandre Oliva [Thu, 7 Nov 2024 05:46:57 +0000 (02:46 -0300)]

[testsuite] fix pr70321.c PIC expectations

When we select a non-bx get_pc_thunk, we get an extra mov to set up
the PIC register before the abort call. Expect that mov or a
get_pc_thunk.bx call.

for gcc/testsuite/ChangeLog

* gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.

commit | commitdiff | tree

xuli [Mon, 4 Nov 2024 10:00:45 +0000 (10:00 +0000)]

RISC-V: Add testcases for signed imm SAT_ADD form1

This patch adds testcase for form1, as shown below:

T __attribute__((noinline))                  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x)             \
{                                            \
  T sum = (UT)x + (UT)IMM;                     \
  return (x ^ IMM) < 0                         \
    ? sum                                    \
    : (sum ^ x) >= 0                         \
      ? sum                                  \
      : x < 0 ? MIN : MAX;                   \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Support signed
imm SAT_ADD form1.
* gcc.target/riscv/sat_s_add_imm-1-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-3-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-4.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-4.c: New test.

commit | commitdiff | tree

xuli [Wed, 6 Nov 2024 01:56:09 +0000 (01:56 +0000)]

Match:Support signed imm SAT_ADD form1

This patch would like to support .SAT_ADD when one of the op
is singed IMM.

Form1:
T __attribute__((noinline))                  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x)             \
{                                            \
  T sum = (UT)x + (UT)IMM;                     \
  return (x ^ IMM) < 0                         \
    ? sum                                    \
    : (sum ^ x) >= 0                         \
      ? sum                                  \
      : x < 0 ? MIN : MAX;                   \
}

Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t sum;
  unsigned char x.0_1;
  unsigned char _2;
  signed char _4;
  int8_t _5;
  _Bool _9;
  signed char _10;
  signed char _11;
  signed char _12;
  signed char _14;
  signed char _16;

  <bb 2> [local count: 1073741824]:
  x.0_1 = (unsigned char) x_6(D);
  _2 = x.0_1 + 246;
  sum_7 = (int8_t) _2;
  _4 = x_6(D) ^ sum_7;
  _16 = x_6(D) ^ 9;
  _14 = _4 & _16;
  if (_14 < 0)
    goto <bb 3>; [41.00%]
  else
    goto <bb 4>; [59.00%]

  <bb 3> [local count: 259738147]:
  _9 = x_6(D) < 0;
  _10 = (signed char) _9;
  _11 = -_10;
  _12 = _11 ^ 127;

  <bb 4> [local count: 1073741824]:
  # _5 = PHI <sum_7(2), _12(3)>
  return _5;

}

After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t _5;

  <bb 2> [local count: 1073741824]:
  _5 = .SAT_ADD (x_6(D), -10); [tail call]
  return _5;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

* match.pd: Add the form1 of signed imm .SAT_ADD matching.
* tree-ssa-math-opts.cc (match_saturation_add): Add fold
convert for const_int to the type of operand 0.

commit | commitdiff | tree

GCC Administrator [Thu, 7 Nov 2024 00:18:14 +0000 (00:18 +0000)]

Daily bump.

commit | commitdiff | tree

H.J. Lu [Wed, 6 Nov 2024 08:14:38 +0000 (16:14 +0800)]

avx10_2-comibf-2.c: Require AVX10.2 support

Since avx10_2-comibf-2.c is a run test, require AVX10.2 support.

* gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

commit | commitdiff | tree

Alexey Merzlyakov [Wed, 6 Nov 2024 21:39:30 +0000 (14:39 -0700)]

[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]

This patch adds optimization of the following patterns:

  (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
  (xor:M (zero_extend:M (subreg:N (X:M)), mask))
    ... where the mask is GET_MODE_MASK (N).

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

  (zero_extend:M (subreg:N (not:M (X:M)))) ->
  (xor:M (X:M, mask))

Patch targets to handle code patterns like:
  not   a0,a0
  andi  a0,a0,0xff
to be optimized to:
  xori  a0,a0,255

Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this optimization):
no regressions for all cases.

PR rtl-optimization/112398
gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr112398.c: New test.

Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>

commit | commitdiff | tree

Iain Sandoe [Wed, 6 Nov 2024 20:46:47 +0000 (20:46 +0000)]

Darwin: Fix a narrowing warning.

cdtor_record needs to have an unsigned entry for the position in order to
match with vec_safe_length.

gcc/ChangeLog:

* config/darwin.cc (cdtor_record): Make position unsigned.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

commit | commitdiff | tree

Andrew Stubbs [Wed, 6 Nov 2024 17:50:00 +0000 (17:50 +0000)]

openmp: Fix signed/unsigned warning

My previous patch broke things when building with Werror.

gcc/ChangeLog:

* omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.

commit | commitdiff | tree

Andrew Stubbs [Wed, 6 Nov 2024 12:26:08 +0000 (12:26 +0000)]

openmp: Add testcases for omp_max_vf

Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when
offloading is enabled ("target" directives are present), and is inactive
otherwise.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: New test.
* testsuite/libgomp.c/max_vf-2.c: New test.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: New test.

commit | commitdiff | tree

Andrew Stubbs [Fri, 1 Nov 2024 15:00:25 +0000 (15:00 +0000)]

openmp: Add IFN_GOMP_MAX_VF

Delay omp_max_vf call until after the host and device compilers have diverged
so that the max_vf value can be tuned exactly right on both variants.

This change means that the ompdevlow pass must be enabled for functions that
use OpenMP directives with both "simd" and "schedule" enabled.

gcc/ChangeLog:

* internal-fn.cc (expand_GOMP_MAX_VF): New function.
* internal-fn.def (GOMP_MAX_VF): New internal function.
* omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when
called in offload context, otherwise assume host context.
* omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.

commit | commitdiff | tree

Andrew Stubbs [Fri, 1 Nov 2024 13:53:34 +0000 (13:53 +0000)]

openmp: use offload max_vf for chunk_size

The chunk size for SIMD loops should be right for the current device; too big
allocates too much memory, too small is inefficient. Getting it wrong doesn't
actually break anything though.

This patch attempts to choose the optimal setting based on the context. Both
host-fallback and device will get the same chunk size, but device performance
is the most important in this case.

gcc/ChangeLog:

* omp-expand.cc (is_in_offload_region): New function.
(omp_adjust_chunk_size): Add pass-through "offload" parameter.
(get_ws_args_for): Likewise.
(determine_parallel_type): Use is_in_offload_region to adjust call to
get_ws_args_for.
(expand_omp_for_generic): Likewise.
(expand_omp_for_static_chunk): Likewise.

commit | commitdiff | tree

Andrew Stubbs [Mon, 21 Oct 2024 12:29:54 +0000 (12:29 +0000)]

openmp: Tune omp_max_vf for offload targets

If requested, return the vectorization factor appropriate for the offload
device, if any.

This change gives a significant speedup in the BabelStream "dot" benchmark on
amdgcn.

The omp_adjust_chunk_size usecase is set "false", for now, but I intend to
change that in a follow-up patch.

Note that NVPTX SIMT offload does not use this code-path.

gcc/ChangeLog:

* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set
omp_max_vf to offload == false.
* omp-expand.cc (omp_adjust_chunk_size): Likewise.
* omp-general.cc (omp_max_vf): Add "offload" parameter, and detect
amdgcn offload devices.
* omp-general.h (omp_max_vf): Likewise.
* omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to
omp_max_vf.

Mirror of https://gcc.gnu.org/git/gcc.git