Jakub Jelinek [Thu, 22 Feb 2024 18:32:02 +0000 (19:32 +0100)]
c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc. modes [PR114007]
We aren't able to parse __has_attribute (vendor::attr) (and __has_c_attribute
and __has_cpp_attribute) in strict C < C23 modes. While in -std=gnu* modes
or in -std=c23 there is CPP_SCOPE token, in -std=c* (except for -std=c23)
there are is just a pair of CPP_COLON tokens.
The c-lex.cc hunk adds support for that.
That leads to a question if we should return 1 or 0 from
__has_attribute (gnu::unused) or not, because while
[[gnu::unused]] is parsed fine in -std=gnu*/-std=c23 modes (sure, with
pedwarn for < C23), we do not parse it at all in -std=c* (except for
-std=c23), we only parse [[__extension__ gnu::unused]] there. While
the __extension__ in there helps to avoid the pedwarn, I think it is
better to be consistent between GNU and strict C < C23 modes and
parse [[gnu::unused]] too; on the other side, I think parsing
[[__extension__ gnu : : unused]] is too weird and undesirable.
So, the following patch adds a flag during preprocessing at the point
where we normally create CPP_SCOPE tokens out of 2 consecutive colons
on the first CPP_COLON to mark the consecutive case (as we are tight
on the bits, I've reused the PURE_ZERO flag, which is used just by the
C++ FE and only ever set (both C and C++) on CPP_NUMBER tokens, this
new flag has the same value and is only ever used on CPP_COLON tokens)
and instead of checking loose_scope_p argument (i.e. whether it is
[[__extension__ ...]] or not), it just parses CPP_SCOPE or CPP_COLON
with CLONE_SCOPE flag followed by another CPP_COLON the same.
The latter will never appear in >= C23 or -std=gnu* modes, though
guarding its use say with flag_iso && !flag_isoc23 && doesn't really
work because the __extension__ case temporarily clears flag_iso flag.
This makes the -std=c11 etc. behavior more similar to -std=gnu11 or
-std=c23, the only difference I'm aware of are the
#define JOIN2(A, B) A##B
[[vendor JOIN2(:,:) attr]]
[[__extension__ vendor JOIN2(:,:) attr]]
cases, which are accepted in the latter modes, but results in error
in -std=c11; but the error is during preprocessing that :: doesn't
form a valid preprocessing token, which is true, so just don't do that if
you try to have __STRICT_ANSI__ && __STDC_VERSION__ <= 201710L
compatibility.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR c/114007
gcc/
* doc/extend.texi: (__extension__): Remove comments about scope
tokens vs. two colons.
gcc/c-family/
* c-lex.cc (c_common_has_attribute): Parse 2 CPP_COLONs with
the first one with COLON_SCOPE flag the same as CPP_SCOPE.
gcc/c/
* c-parser.cc (c_parser_std_attribute): Remove loose_scope_p argument.
Instead of checking it, parse 2 CPP_COLONs with the first one with
COLON_SCOPE flag the same as CPP_SCOPE.
(c_parser_std_attribute_list): Remove loose_scope_p argument, don't
pass it to c_parser_std_attribute.
(c_parser_std_attribute_specifier): Adjust c_parser_std_attribute_list
caller.
gcc/testsuite/
* gcc.dg/c23-attr-syntax-6.c: Adjust testcase for :: being valid
even in -std=c11 even without __extension__ and : : etc. not being
valid anymore even with __extension__.
* gcc.dg/c23-attr-syntax-7.c: Likewise.
* gcc.dg/c23-attr-syntax-8.c: New test.
libcpp/
* include/cpplib.h (COLON_SCOPE): Define to PURE_ZERO.
* lex.cc (_cpp_lex_direct): When lexing CPP_COLON with another
colon after it, if !CPP_OPTION (pfile, scope) set COLON_SCOPE
flag on the first CPP_COLON token.
Andrew Pinski [Thu, 22 Feb 2024 04:12:21 +0000 (20:12 -0800)]
warn-access: Fix handling of unnamed types [PR109804]
This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
the code expected newc.u.s_binary.left would be valid.
So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function paramaters
(DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters (DEMANGLE_COMPONENT_TEMPLATE_PARAM).
Note the code in the demangler does this when it sets DEMANGLE_COMPONENT_UNNAMED_TYPE:
ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
ret->u.s_number.number = num;
Committed as obvious after bootstrap/test on x86_64-linux-gnu
This is because the non-Q variant for indices 0 and 1 are just shuffling values.
There is no perf difference between INS SIMD to SIMD and ZIP on Arm uArches but
preferring the INS alternative has a drawback on all uArches as ZIP being a three
operand instruction can be used to tie the result to the return register whereas
INS would require an fmov.
As such just update the test file for now.
gcc/testsuite/ChangeLog:
PR target/112375
* gcc.target/aarch64/vget_set_lane_1.c: Update test output.
Gaius Mulley [Thu, 22 Feb 2024 15:02:19 +0000 (15:02 +0000)]
PR modula2/114055 improve error message when checking the BY constant
The fix marks a constant created during the default BY clause of the
FOR loop as internal. The type checker will always return true if
checking against an internal const.
gcc/m2/ChangeLog:
PR modula2/114055
* gm2-compiler/M2Check.mod (Import): IsConstLitInternal and
IsConstLit.
(isInternal): New procedure function.
(doCheck): Test for isInternal in either operand and early
return true.
* gm2-compiler/M2Quads.mod (PushOne): Rewrite with extra
parameter internal.
(BuildPseudoBy): Add TRUE parameter to PushOne call.
(BuildIncProcedure): Add FALSE parameter to PushOne call.
(BuildDecProcedure): Add FALSE parameter to PushOne call.
* gm2-compiler/M2Range.mod (ForLoopBeginTypeCompatible):
Uncomment code and tidy up error string.
* gm2-compiler/SymbolTable.def (PutConstLitInternal):
New procedure.
(IsConstLitInternal): New procedure function.
* gm2-compiler/SymbolTable.mod (PutConstLitInternal):
New procedure.
(IsConstLitInternal): New procedure function.
(SymConstLit): New field IsInternal.
(CreateConstLit): Initialize IsInternal to FALSE.
gcc/testsuite/ChangeLog:
PR modula2/114055
* gm2/pim/fail/forloopby.mod: New test.
* gm2/pim/pass/forloopby2.mod: New test.
When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.
PR tree-optimization/114027
* tree-vect-loop.cc (vecctorizable_reduction): Use optimized
condition reduction classification only for single-element
chains.
Jakub Jelinek [Thu, 22 Feb 2024 12:07:25 +0000 (13:07 +0100)]
profile-count: Don't dump through a temporary buffer [PR111960]
The profile_count::dump (char *, struct function * = NULL) const;
method has a single caller, the
profile_count::dump (FILE *f, struct function *fun) const;
method and for that going through a temporary buffer is just slower
and opens doors for buffer overflows, which is exactly why this P1
was filed.
The buffer size is 64 bytes, the previous maximum
"%" PRId64 " (%s)"
would print up to 61 bytes in there (19 bytes for arbitrary uint64_t:61
bitfield printed as signed, "estimated locally, globally 0 adjusted"
i.e. 38 bytes longest %s and 4 other characters).
Now, after the r14-2389 changes, it can be
19 + 38 plus 11 other characters + %.4f, which is worst case
309 chars before decimal point, decimal point and 4 digits after it,
so total 382 bytes.
So, either we could bump the buffer[64] to buffer[400], or the following
patch just drops the indirection through buffer and prints it directly to
stream. After all, having APIs which fill in some buffer without passing
down the size of the buffer is just asking for buffer overflows over time.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR ipa/111960
* profile-count.h (profile_count::dump): Remove overload with
char * first argument.
* profile-count.cc (profile_count::dump): Change overload with char *
first argument which uses sprintf into the overfload with FILE *
first argument and use fprintf instead. Remove overload which wrapped
it.
Jakub Jelinek [Thu, 22 Feb 2024 09:19:15 +0000 (10:19 +0100)]
call-cdce: Add missing BUILT_IN_*F{32,64}X handling and improve BUILT_IN_*L [PR113993]
The following testcase ICEs, because can_test_argument_range
returns true for BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}
among many other builtins, but get_no_error_domain doesn't handle
those.
float32x_type_node when supported in GCC always has DFmode, so that
case is easy (and call-cdce assumes that SFmode is IEEE float and DFmode
is IEEE double). So *F32X is simply handled by adding those cases
next to *F64.
float64x_type_node when supported in GCC by definition has a mode
with larger precision and exponent range than DFmode, so it can be XFmode,
TFmode or KFmode. I went through all the l/f128 suffixed builtins and
verified that the float128x_type_node no error domain range is actually
identical to the Intel extended long double no error domain range; it isn't
that surprising, both IEEE quad and Intel/Motorola extended have the same
exponent range [-16381, 16384] (well, Motorola -16382 probably because of
different behavior for denormals, but that has nothing to do with
get_no_error_domain which is about large inputs overflowing into +-Inf
or triggering NaN, denormals could in theory do something solely for sqrt
and even that is fine). In theory some target could have different larger
type, so for *F64X the code verifies that
REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384
and if so, uses the *F128 domains, otherwise falls back to the non-suffixed
ones (aka *F64), that is certainly the conservative minimum.
While at it, the patch also changes the *L suffixed cases to do pretty much
the same, the comment said that the function just assumes for *L
the *F64 ranges, but that is unnecessarily conservative.
All we currently have for long double is:
1) IEEE quad (emax 16384, *F128 ranges)
2) XFmode Intel/Motorola extended (emax 16384, same as *F128 ranges)
3) IBM extended (double double, emax 1024, the extra precision doesn't
really help and the domains are the same as for *F64)
4) same as double (*F64 again)
So, the patch uses also for *L
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
checks and either tail recurses into the *F128 case for that or to
non-suffixed (aka *F64) case otherwise.
BUILT_IN_*F128X not handled because no target has those and it doesn't
seem something is on the horizon and who knows what would be used for that.
Thus, all we get this wrong for are probably VAX floats or something
similar, no intent from me to look at that, that is preexisting issue.
BTW, I'm surprised we don't have BUILT_IN_EXP10F{16,32,64,128,32X,64X,128X}
builtins, seems glibc has those (sure, I think except *16 and *128x).
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113993
* tree-call-cdce.cc (get_no_error_domain): Handle
BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}. Handle
BUILT_IN_{COSH,SINH,EXP{,M1,2}}L for
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
the as the F128 suffixed cases, otherwise as non-suffixed ones.
Handle BUILT_IN_{EXP,POW}10L for
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
as (-inf, 4932).
Currently, bitint_large_huge::lower_mul_overflow uses cnt 1 only if
startlimb == endlimb and in that case doesn't use a loop and handles
everything in a special if:
unsigned cnt;
bool use_loop = false;
if (startlimb == endlimb)
cnt = 1;
else if (startlimb + 1 == endlimb)
cnt = 2;
else if ((end % limb_prec) == 0)
{
cnt = 2;
use_loop = true;
}
else
{
cnt = 3;
use_loop = startlimb + 2 < endlimb;
}
if (cnt == 1)
{
...
}
else
The loop handling for the loop exit condition wants to compare if the
incremented index is equal to endlimb, but that is correct only if
end is not divisible by limb_prec and there will be a straight line
check after the loop as well for the most significant limb. The code
used endlimb + (cnt == 1) for that, but cnt == 1 is never true here,
because cnt is either 2 or 3, so the right check is (cnt == 2).
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114038
* gimple-lower-bitint.cc (bitint_large_huge::lower_mul_overflow): Fix
loop exit condition if end is divisible by limb_prec.
YunQiang Su [Thu, 22 Feb 2024 05:05:06 +0000 (13:05 +0800)]
invoke.texi: Fix some skipping UrlSuffix problem for MIPS
The problem is that, there are these lines in mips.opt.urls:
; skipping UrlSuffix for 'mabi=' due to finding no URLs
; skipping UrlSuffix for 'mno-flush-func' due to finding no URLs
; skipping UrlSuffix for 'mexplicit-relocs' due to finding no URLs
These lines is not fixed by this patch due to that we don't
document these options:
; skipping UrlSuffix for 'mlra' due to finding no URLs
; skipping UrlSuffix for 'mdebug' due to finding no URLs
; skipping UrlSuffix for 'meb' due to finding no URLs
; skipping UrlSuffix for 'mel' due to finding no URLs
gcc
* doc/invoke.texi(MIPS Options): Fix skipping UrlSuffix
problem of mabi=, mno-flush-func, mexplicit-relocs;
add missing leading - of mbranch-cost option.
* config/mips/mips.opt.urls: Regenerate.
As PR109987 and its duplicated bugs show, -mno-power8-vector
(and -mno-power9-vector) cause some problems and as Segher
pointed out in [1] they are workaround options, so this patch
is to remove -m{no,}-power{8,9}-options. Like what we did
for option -mdirect-move before, this patch still keep the
corresponding internal flags and they are automatically set
based on -mcpu. The test suite update takes some efforts,
it consists of some aspects:
- effective target powerpc_p{8,9}vector_ok are removed
and replaced with powerpc_vsx_ok.
- Some cases having -mpower{8,9}-vector are updated with
-mvsx, some of them already have -mdejagnu-cpu. For
those that don't have -mdejagnu-cpu, if -mdejagnu-cpu
is needed for the test point, then it's appended;
otherwise, add additional-options -mdejagnu-cpu=power{8,9}
if has_arch_pwr{8,9} isn't satisfied.
- Some test cases are updated with explicit -mvsx.
- Some test cases with those two option mixed are adjusted
to keep the test points, like -mpower8-vector
-mno-power9-vector are updated with -mdejagnu-cpu=power8
-mvsx etc.
- Some test cases with -mno-power{8,9}-vector are updated
by replacing -mno-power{8,9}-vector with -mno-vsx, or
just removing it.
- For some cases, we don't always specify -mdejagnu-cpu to
avoid to restrict the testing coverage, it would check
has_arch_pwr{8,9} and appended that as need.
- For vect test cases run, it doesn't specify -mcpu=power9
for power10 and up.
Bootstrapped and regtested on:
- powerpc64-linux-gnu P7/P8/P9 {-m32,-m64}
- powerpc64le-linux-gnu P8/P9/P10
Although it's stage4 now, as the discussion in PR113115 we
are still eager to neuter these two options, so is it ok
for trunk?
* config/rs6000/constraints.md (we): Update internal doc without
referring to option -mpower9-vector.
* config/rs6000/driver-rs6000.cc (asm_names): Remove mpower9-vector
special handlings.
* config/rs6000/rs6000-cpus.def (OTHER_P9_VECTOR_MASKS,
OTHER_P8_VECTOR_MASKS): Merge to ...
(OTHER_VSX_VECTOR_MASKS): ... here.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
some error message handlings and explicit option mask adjustments on
explicit option power{8,9}-vector conflicting with other options.
(rs6000_print_isa_options): Update comments.
(rs6000_disable_incompatible_switches): Remove power{8,9}-vector
related array items and handlings.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Remove mpower9-vector
special handlings.
* config/rs6000/rs6000.opt: Make option power{8,9}-vector as
WarnRemoved.
* doc/extend.texi: Remove documentation referring to option
-mpower8-vector.
* doc/invoke.texi: Remove documentation for option
-mpower{8,9}-vector and adjust some documentation referring to them.
* doc/md.texi: Update documentation for constraint we.
* doc/sourcebuild.texi: Remove documentation for powerpc_p8vector_ok.
libgcc/ChangeLog:
* config/rs6000/t-float128-hw: Replace options -mpower{8,9}-vector
with -mcpu=power9.
* configure.ac: Update use of option -mpower9-vector with
-mcpu=power9.
* configure: Regenerate.
Fangrui Song [Wed, 31 Jan 2024 04:41:12 +0000 (20:41 -0800)]
RISC-V: Add tests for constraints "i" and "s"
The constraints "i" and "s" can be used with a symbol that binds
externally, e.g.
```
namespace ns { extern int var, a[4]; }
void foo() {
asm(".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "s"(&ns::var));
asm(".reloc ., BFD_RELOC_NONE, %0" :: "s"(&ns::a[3]));
}
```
Edwin Lu [Wed, 14 Feb 2024 20:06:38 +0000 (12:06 -0800)]
RISC-V: Quick and simple fixes to testcases that break due to reordering
The following test cases are easily fixed with small updates to the expected
assembly order. Additionally make calling-convention testcases more robust
Edwin Lu [Wed, 14 Feb 2024 20:04:59 +0000 (12:04 -0800)]
RISC-V: Use default cost model for insn scheduling
Use default cost model scheduling on these test cases. All these tests
introduce scan dump failures with -mtune generic-ooo. Since the vector
cost models are the same across all three tunes, some of the tests
in PR113249 will be fixed with this patch series.
Edwin Lu [Wed, 14 Feb 2024 20:03:37 +0000 (12:03 -0800)]
RISC-V: Add vector related pipelines
Creates new generic vector pipeline file common to all cpu tunes.
Moves all vector related pipelines from generic-ooo to generic-vector-ooo.
Creates new vector crypto related insn reservations.
Edwin Lu [Wed, 14 Feb 2024 20:01:22 +0000 (12:01 -0800)]
RISC-V: Add non-vector types to dfa pipelines
This patch adds non-vector related insn reservations and updates/creates
new insn reservations so all non-vector typed instructions have a reservation.
David Faust [Tue, 20 Feb 2024 22:48:33 +0000 (14:48 -0800)]
bpf: add inline memmove and memcpy expansion
BPF programs are not typically linked, which means we cannot fall back
on library calls to implement __builtin_{memmove,memcpy} and should
always expand them inline if possible.
GCC already successfully expands these builtins inline in many cases,
but failed to do so for a few for simple cases involving overlapping
memmove in the kernel BPF selftests and was instead emitting a libcall.
This patch implements a simple inline expansion of memcpy and memmove in
the BPF backend in a verifier-friendly way, with the caveat that the
size must be an integer constant, which is also required by clang.
Gaius Mulley [Wed, 21 Feb 2024 16:21:05 +0000 (16:21 +0000)]
PR modula2/114026 Incorrect location during for loop type checking
If a for loop contains an incompatible type expression between the
designator and the second expression then the location
used when generating the error message is set to token 0.
The bug is fixed by extending the range checking
InitForLoopBeginRangeCheck. The range checking is processed after
all types, constants have been resolved (and converted into gcc
trees). The range check will check for assignment compatibility
between des and expr1, expression compatibility between des and expr2.
Separate token positions for des, exp1, expr2 and by are stored in the
Range record and used to create virtual tokens if they are on the same
source line.
gcc/m2/ChangeLog:
PR modula2/114026
* gm2-compiler/M2GenGCC.mod (Import): Remove DisplayQuadruples.
Remove DisplayQuadList.
(MixTypesBinary): Replace check with overflowCheck.
New variable typeChecking.
Use GenQuadOTypetok to retrieve typeChecking.
Use typeChecking to suppress error message.
* gm2-compiler/M2LexBuf.def (MakeVirtual2Tok): New procedure
function.
* gm2-compiler/M2LexBuf.mod (MakeVirtualTok): Improve comment.
(MakeVirtual2Tok): New procedure function.
* gm2-compiler/M2Quads.def (GetQuadOTypetok): New procedure.
* gm2-compiler/M2Quads.mod (QuadFrame): New field CheckType.
(PutQuadO): Rewrite using PutQuadOType.
(PutQuadOType): New procedure.
(GetQuadOTypetok): New procedure.
(BuildPseudoBy): Rewrite.
(BuildForToByDo): Remove type checking.
Add parameters e2, e2tok, BySym, bytok to
InitForLoopBeginRange.
Push the RangeId.
(BuildEndFor): Pop the RangeId.
Use GenQuadOTypetok to generate AddOp without type checking.
Call PutRangeForIncrement with the RangeId and IncQuad.
(GenQuadOtok): Rewrite using GenQuadOTypetok.
(GenQuadOTypetok): New procedure.
* gm2-compiler/M2Range.def (InitForLoopBeginRangeCheck):
Rename d as des, e as expr.
Add expr1, expr1tok, expr2, expr2tok, byconst, byconsttok
parameters.
(PutRangeForIncrement): New procedure.
* gm2-compiler/M2Range.mod (Import): MakeVirtual2Tok.
(Range): Add expr2, byconst, destok, exprtok, expr2tok,
incrementquad.
(InitRange): Initialize expr2 to NulSym.
Initialize byconst to NulSym.
Initialize tokenNo, destok, exprtok, expr2tok, byconst to
UnknownTokenNo.
Initialize incrementquad to 0.
(PutRangeForIncrement): New procedure.
(PutRangeDesExpr2): New procedure.
(InitForLoopBeginRangeCheck): Rewrite.
(ForLoopBeginTypeCompatible): New procedure function.
(CodeForLoopBegin): Call ForLoopBeginTypeCompatible and
only code the for loop assignment if all the type checks
succeed.
gcc/testsuite/ChangeLog:
PR modula2/114026
* gm2/extensions/run/pass/callingc10.mod: New test.
* gm2/extensions/run/pass/callingc11.mod: New test.
* gm2/extensions/run/pass/callingc9.mod: New test.
* gm2/extensions/run/pass/strconst.def: New test.
* gm2/pim/fail/forloop.mod: New test.
* gm2/pim/pass/forloop2.mod: New test.
Martin Jambor [Wed, 21 Feb 2024 14:43:13 +0000 (15:43 +0100)]
ipa: Convert lattices from pure array to vector (PR 113476)
In PR 113476 we have discovered that ipcp_param_lattices is no longer
a POD and should be destructed. In a follow-up discussion it
transpired that their initialization done by memsetting their backing
memory to zero is also invalid because now any write there before
construction can be considered dead. Plus that having them in an
array is a little bit old-school and does not get the extra checking
offered by vector along with automatic construction and destruction
when necessary.
So this patch converts the array to a vector. That however means that
ipcp_param_lattices cannot be just a forward declared type but must be
known to all code that deals with ipa_node_params and thus to all code
that includes ipa-prop.h. Therefore I have moved ipcp_param_lattices
and the type it depends on to a new header ipa-cp.h which now
ipa-prop.h depends on. Because we have the (IMHO not a very wise)
rule that headers don't include what they need themselves, I had to
add inclusions of ipa-cp.h and sreal.h (on which it depends) to very
many files, which made the patch rather ugly.
gcc/lto/ChangeLog:
2024-02-16 Martin Jambor <mjambor@suse.cz>
PR ipa/113476
* lto-common.cc: Include sreal.h and ipa-cp.h.
* lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher.
* lto.cc: Include sreal.h and ipa-cp.h.
gcc/ChangeLog:
2024-02-16 Martin Jambor <mjambor@suse.cz>
PR ipa/113476
* ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust
initializers in the contructor.
(ipa_node_params::~ipa_node_params): Release lattices as a vector.
* ipa-cp.h: New file.
* ipa-cp.cc: Include sreal.h and ipa-cp.h.
(ipcp_value_source): Move to ipa-cp.h.
(ipcp_value_base): Likewise.
(ipcp_value): Likewise.
(ipcp_lattice): Likewise.
(ipcp_agg_lattice): Likewise.
(ipcp_bits_lattice): Likewise.
(ipcp_vr_lattice): Likewise.
(ipcp_param_lattices): Likewise.
(ipa_get_parm_lattices): Remove assert latticess is non-NULL.
(ipa_value_from_jfunc): Adjust a check for empty lattices.
(ipa_context_from_jfunc): Likewise.
(ipa_agg_value_from_jfunc): Likewise.
(merge_agg_lats_step): Do not memset new aggregate lattices to zero.
(ipcp_propagate_stage): Allocate lattices in a vector as opposed to
just in contiguous memory.
(ipcp_store_vr_results): Adjust a check for empty lattices.
* auto-profile.cc: Include sreal.h and ipa-cp.h.
* cgraph.cc: Likewise.
* cgraphclones.cc: Likewise.
* cgraphunit.cc: Likewise.
* config/aarch64/aarch64.cc: Likewise.
* config/i386/i386-builtins.cc: Likewise.
* config/i386/i386-expand.cc: Likewise.
* config/i386/i386-features.cc: Likewise.
* config/i386/i386-options.cc: Likewise.
* config/i386/i386.cc: Likewise.
* config/rs6000/rs6000.cc: Likewise.
* config/s390/s390.cc: Likewise.
* gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the
files to be included in gtype-desc.cc.
* gimple-range-fold.cc: Include sreal.h and ipa-cp.h.
* ipa-devirt.cc: Likewise.
* ipa-fnsummary.cc: Likewise.
* ipa-icf.cc: Likewise.
* ipa-inline-analysis.cc: Likewise.
* ipa-inline-transform.cc: Likewise.
* ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher.
* ipa-modref.cc: Include sreal.h and ipa-cp.h.
* ipa-param-manipulation.cc: Likewise.
* ipa-predicate.cc: Likewise.
* ipa-profile.cc: Likewise.
* ipa-prop.cc: Likewise.
(ipa_node_params_t::duplicate): Assert new lattices remain empty
instead of setting them to NULL.
* ipa-pure-const.cc: Include sreal.h and ipa-cp.h.
* ipa-split.cc: Likewise.
* ipa-sra.cc: Likewise.
* ipa-strub.cc: Likewise.
* ipa-utils.cc: Likewise.
* ipa.cc: Likewise.
* toplev.cc: Likewise.
* tree-ssa-ccp.cc: Likewise.
* tree-ssa-sccvn.cc: Likewise.
* tree-vrp.cc: Likewise.
Tamar Christina [Wed, 21 Feb 2024 11:42:53 +0000 (11:42 +0000)]
AArch64: remove ls64 from being mandatory on armv8.7-a..
The Arm Architectural Reference Manual (Version J.a, section A2.9 on FEAT_LS64)
shows that ls64 is an optional extensions and should not be enabled by default
for Armv8.7-a.
This drops it from the mandatory bits for the architecture and brings GCC inline
with LLVM and the achitecture.
Note that we will not be changing binutils to preserve compatibility with older
released compilers.
gcc/ChangeLog:
* config/aarch64/aarch64-arches.def (AARCH64_ARCH): Remove LS64 from
Armv8.7-a.
The sequence to commit a lazy save includes a branch based on
whether TPIDR2_EL0 is zero. The code assumed that CBZ could
be used for this, but that instruction is forbidden when
-mtrack-speculation is being used.
gcc/
* config/aarch64/aarch64.cc (aarch64_mode_emit_local_sme_state):
Use aarch64_gen_compare_zero_and_branch rather than emitting
a CBZ directly.
gcc/testsuite/
* gcc.target/aarch64/sme/locally_streaming_1_ts.c: New test.
* gcc.target/aarch64/sme/sibcall_7_ts.c: Likewise.
foo cannot tail-call bar because foo needs to restore ZT0 after
the call. I'd forgotten to update the ok_for_sibcall rules
to handle this when adding SME2.
Thanks to Sander de Smalen for the spot.
gcc/
* config/aarch64/aarch64.cc (aarch64_function_ok_for_sibcall):
Check that each individual piece of state is shared in the same
way, rather than using an aggregate check for PSTATE.ZA.
gcc/testsuite/
* gcc.target/aarch64/sme/sibcall_9.c: New test.
aarch64: Ensure ZT0 is zeroed in a new-ZT0 function
ACLE guarantees that a function like:
__arm_new("zt0") foo() { ... }
will start with ZT0 equal to zero. I'd forgotten to enforce that
after commiting a lazy save. After such a save, we should zero
ZA iff the function has ZA state and zero ZT0 iff the function
has ZT0 state.
gcc/
* config/aarch64/aarch64.cc (aarch64_mode_emit_local_sme_state):
In the code that commits a lazy save, only zero ZA if the function
has ZA state. Similarly zero ZT0 if the function has ZT0 state.
gcc/testsuite/
* gcc.target/aarch64/sme/zt0_state_5.c (test3): Expect ZT0 rather
than ZA to be zeroed.
(test5): Remove zeroing of ZA.
aarch64: Remove the aarch64_commit_lazy_save pattern
The main purpose of the aarch64_commit_lazy_save pattern
was to defer insertion of a half-diamond until splitting,
since splitting knew how to create the associated basic blocks.
However, the fix for PR113220 means that mode-switching also
knows how to do that. This patch therefore removes the pattern
and emits the subinstructions directly.
On its own, this is actually a slight regression, since it
means we keep an unnecessary zero { za }. But the cases
where that happens are wrong for a different reason, and this
patch is a prerequisite to fixing it.
aarch64: Stack-clash prologues and VG saves [PR113995]
This patch fixes an ICE for a combination of:
- -fstack-clash-protection
- a frame that has SVE save slots
- a frame that has no GPR save slots
- a frame that has a VG save slot
The allocation code was folding the SVE save slot allocation into
the initial frame allocation, so that we had one allocation of
size <size of SVE registers> + 16. But the VG save code itself
expected the allocations to remain separate, since it wants to
store at a constant offset from SP or FP.
The VG save isn't shrink-wrapped and so acts as a probe of the
initial allocations. It should therefore be safe to keep separate
allocations in this case.
The scans in locally_streaming_1.c expect no stack clash protection,
so the patch forces that and adds a separate compile-only test for
when protection is enabled.
gcc/
PR target/113995
* config/aarch64/aarch64.cc (aarch64_expand_prologue): Don't
fold the SVE allocation into the initial allocation if the
initial allocation includes a VG save.
Allow mode-switching to introduce internal loops [PR113220]
In this PR, the SME mode-switching code needs to insert a stack-probe
loop for an alloca. This patch allows the target to do that.
There are two parts to it: allowing loops for insertions in blocks,
and allowing them for insertions on edges. The former can be handled
entirely within mode-switching itself, by recording which blocks have
had new branches inserted. The latter requires an extension to
commit_one_edge_insertion.
I think the extension to commit_one_edge_insertion makes logical sense,
since it already explicitly allows internal loops during RTL expansion.
The single-block find_sub_basic_blocks is a relatively recent addition,
so wouldn't have been available when the code was originally written.
The patch also has a small and obvious fix to make the aarch64 emit
hook cope with labels.
I've added specific -fstack-clash-protection versions of all
aarch64-sme.exp tests that previously failed because of this bug.
I've also added -fno-stack-clash-protection to the original versions
of these tests if they contain scans that assume no protection.
gcc/
PR target/113220
* cfgrtl.cc (commit_one_edge_insertion): Handle sequences that
contain jumps even if called after initial RTL expansion.
* mode-switching.cc: Include cfgbuild.h.
(optimize_mode_switching): Allow the sequence returned by the
emit hook to contain internal jumps. Record which blocks
contain such jumps and split the blocks at the end.
* config/aarch64/aarch64.cc (aarch64_mode_emit): Check for
non-debug insns when scanning the sequence.
gcc/testsuite/
PR target/113220
* gcc.target/aarch64/sme/call_sm_switch_5.c: Add
-fno-stack-clash-protection.
* gcc.target/aarch64/sme/call_sm_switch_5_scp.c: New test.
* gcc.target/aarch64/sme/sibcall_6_scp.c: New test.
* gcc.target/aarch64/sme/za_state_4.c: Add
-fno-stack-clash-protection.
* gcc.target/aarch64/sme/za_state_4_scp.c: New test.
* gcc.target/aarch64/sme/za_state_5.c: Add
-fno-stack-clash-protection.
* gcc.target/aarch64/sme/za_state_5_scp.c: New test.
Tobias Burnus [Wed, 21 Feb 2024 10:31:43 +0000 (11:31 +0100)]
OpenMP/nvptx: support 'arch(nvptx64)' as context selector
The main 'arch' context selector for nvptx is, well, 'nvptx';
however, as 'nvptx64' is used as by LLVM, it makes sense
to support it as well.
Note that LLVM has: "The triple architecture can be one of
``nvptx`` (32-bit PTX) or ``nvptx64`` (64-bit PTX)."
GCC effectively only supports the 64bit variant (at least for
offloading). Thus, GCC's 'nvptx' is not quite the same as LLVM's.
The device-compiler part (nvptx_omp_device_kind_arch_isa) uses
TARGET_ABI64 such that nvptx64 is only defined with -m64.
gcc/ChangeLog:
* config/nvptx/gen-omp-device-properties.sh: Add 'nvptx64' to arch.
* config/nvptx/nvptx.cc (nvptx_omp_device_kind_arch_isa): Likewise.
libgomp/ChangeLog:
* libgomp.texi (OpenMP Context Selectors): Add 'nvptx64' as additional
'arch' value for nvptx.
Ilya Leoshkevich [Mon, 19 Feb 2024 10:51:38 +0000 (11:51 +0100)]
IBM Z: Preserve exceptions in autovec-*-signaling-eq.c tests
DSE, DCE, and other passes are removing redundant signaling comparisons
from these tests, but the whole point is to check that GCC knows how to
emit them. Use -fno-delete-dead-exceptions to prevent that.
The plan to maintain PRU hardware-specific specs in newlib tree has been
abandoned in favour of a new distinct GIT project. Update the
documentation accordingly.
gcc/ChangeLog:
* doc/invoke.texi (-mmcu): Add information about MCU specs.
pru: Document that arguments are not passed to main with -minrt
The minimal runtime has been documented from the beginning to break some
standard features in order to reduce code size, while keeping
the features required by typical firmware programs. Document one more
imposed restriction - the main() function must take no arguments.
gcc/ChangeLog:
* doc/invoke.texi (-minrt): Clarify that main
must take no arguments.
Iain Sandoe [Sun, 18 Feb 2024 06:52:47 +0000 (06:52 +0000)]
libgcc, aarch64: Allow for BE platforms in heap trampolines.
This arranges that the byte order of the instruction sequences is
independent of the byte order of memory.
libgcc/ChangeLog:
* config/aarch64/heap-trampoline.c
(aarch64_trampoline_insns): Arrange to encode instructions as a
byte array so that the order is independent of memory byte order.
(struct aarch64_trampoline): Likewise.
In _GLIBCXX_DEBUG mode the std::__niter_base can remove 2 layers, the
__gnu_debug::_Safe_iterator<> and the __gnu_cxx::__normal_iterator<>.
When std::__niter_wrap is called to build a __gnu_debug::_Safe_iterator<>
from a __gnu_cxx::__normal_iterator<> we then have a consistency issue
as the difference between the 2 iterators will done on a __normal_iterator
on one side and a C pointer on the other. To avoid this problem call
std::__niter_base on both input iterators.
libstdc++-v3/ChangeLog:
* include/bits/stl_algobase.h (std::__niter_wrap): Add a call to
std::__niter_base on res iterator.
Peter Hill [Tue, 20 Feb 2024 19:42:53 +0000 (20:42 +0100)]
Fortran: fix passing array component ref to polymorphic procedures
PR fortran/105658
gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_conv_intrinsic_to_class): When passing an
array component reference of intrinsic type to a procedure
with an unlimited polymorphic dummy argument, a temporary
should be created.
Georg-Johann Lay [Tue, 20 Feb 2024 13:54:44 +0000 (14:54 +0100)]
AVR: Use types of exact size and signedness in built-ins.
The AVR built-ins used types like "int" or "char" that don't
have exact signedness or type size which depend on -mint8
and -f[no-][un-]signed-char etc. As the built-ins are modelling
machine instructions of given type sizes and signedness, also
use according types in their prototypes.
gcc/
* config/avr/builtins.def: Use function prototypes of given size
and signedness.
* config/avr/avr.cc (avr_init_builtins): Adjust types required
by builtins.def.
* doc/extend.texi (AVR Built-in Functions): Adjust accordingly.
aarch64: Fix streaming-compatible code with -mtrack-speculation [PR113805]
This patch makes -mtrack-speculation work on streaming-compatible
functions. There were two related issues. The first is that the
streaming-compatible code was using TB(N)Z unconditionally, whereas
those instructions are not allowed with speculation tracking.
That part can be fixed in a similar way to the recent eh_return
fix (PR112987).
The second issue was that the speculation-tracking pass runs
before some of the conditional branches are inserted. It isn't
safe to insert the branches any earlier, so the patch instead adds
a second speculation-tracking pass that runs afterwards. The new
pass is only used for streaming-compatible functions.
The testcase is adapted from call_sm_switch_1.c.
gcc/
PR target/113805
* config/aarch64/aarch64-passes.def (pass_late_track_speculation):
New pass.
* config/aarch64/aarch64-protos.h (make_pass_late_track_speculation):
Declare.
* config/aarch64/aarch64.md (is_call): New attribute.
(*and<mode>3nr_compare0): Rename to...
(@aarch64_and<mode>3nr_compare0): ...this.
* config/aarch64/aarch64-sme.md (aarch64_get_sme_state)
(aarch64_tpidr2_save, aarch64_tpidr2_restore): Add is_call attributes.
* config/aarch64/aarch64-speculation.cc: Update file comment to
describe the new late pass.
(aarch64_do_track_speculation): Handle is_call insns like other calls.
(pass_track_speculation): Add an is_late member variable.
(pass_track_speculation::gate): Run the late pass for streaming-
compatible functions and the early pass for other functions.
(make_pass_track_speculation): Update accordingly.
(make_pass_late_track_speculation): New function.
* config/aarch64/aarch64.cc (aarch64_gen_test_and_branch): New
function.
(aarch64_guard_switch_pstate_sm): Use it.
gcc/testsuite/
PR target/113805
* gcc.target/aarch64/sme/call_sm_switch_11.c: New test.
Jakub Jelinek [Tue, 20 Feb 2024 09:31:46 +0000 (10:31 +0100)]
testsuite: Fix up analyzer/torture/vector-extract-1.c test for i686 [PR113983]
The testcase fails on i686-linux with
.../gcc/testsuite/gcc.dg/analyzer/torture/vector-extract-1.c:11:1: warning: MMX vector return without MMX enabled changes the ABI [-Wpsabi]
Added -Wno-psabi to silence the warning.
2024-02-20 Jakub Jelinek <jakub@redhat.com>
PR analyzer/113983
* gcc.dg/analyzer/torture/vector-extract-1.c: Add -Wno-psabi as
dg-additional-options.
liuhongt [Mon, 19 Feb 2024 04:19:35 +0000 (12:19 +0800)]
Fix testcase for platform without gnu/stubs-x32.h
target maybe_x32 doesn't check if platform has gnu/stubs-x32.h, but
it's included by stdint.h in the testcase.
Adjust testcase: remove stdint.h, use 'typedef long long int64_t'
instead.
Andrew Pinski [Sun, 18 Feb 2024 22:14:23 +0000 (14:14 -0800)]
analyzer: Fix maybe_undo_optimize_bit_field_compare vs non-scalar types [PR113983]
After r14-6419-g4eaaf7f5a378e8, maybe_undo_optimize_bit_field_compare would ICE on
vector CST but this function really should be checking if we had integer types so
reject non-integral types early on (like it was doing for non-char type before r14-6419-g4eaaf7f5a378e8).
Committed as obvious after build and tested for aarch64-linux-gnu with no regressions.
PR analyzer/113983
gcc/analyzer/ChangeLog:
* region-model-manager.cc (maybe_undo_optimize_bit_field_compare): Reject
non integral types.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/torture/vector-extract-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Iain Sandoe [Thu, 8 Feb 2024 17:54:31 +0000 (17:54 +0000)]
libstdc++, Darwin: Handle a linker warning [PR112397].
Darwin's linker warns when we make a direct branch to code that is
in a weak definition (citing that if a different implementation of
the weak function is chosen by the dynamic linker this would be an
error).
As the analysis in the PR shows, this can happen when we have hot/
cold partitioning and there is an error path that is primarily cold
but makes use of epilogue code in the hot section. In this simple
case, we can easily deduce that the code is in fact safe; however
that is not something we can realistically implement in the linker.
Since the user-replaceable allocators are implemented using weak
definitions, this is a warning that is frequently flagged up in both
the testsuite and end-user code.
The chosen solution here is to suppress the hot/cold partitioning for
these cases (it is unlikely to impact performance much c.f. the
actual allocation).
PR target/112397
libstdc++-v3/ChangeLog:
* configure: Regenerate.
* configure.ac: Detect if we are building for Darwin.
* libsupc++/Makefile.am: If we are building for Darwin, then
suppress hot/cold partitioning for the array allocators.
* libsupc++/Makefile.in: Regenerated.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Iain Sandoe [Fri, 16 Feb 2024 14:42:53 +0000 (14:42 +0000)]
libiberty: Fix error return value in pex_unix_exec_child [PR113957].
r14-5310-g879cf9ff45d940 introduced some new handling for spawning sub
processes. The return value from the generic exec_child is examined
and needs to be < 0 to signal an error. However, the unix flavour of
this routine is returning the PID value set from the posix_spawn{p}.
This latter value is undefined per the manual pages for both Darwin
and Linux, and it seems Darwin, at least, sets the value to some
usually positive number (presumably the PID that would have been used
if the fork had succeeded).
The fix proposed here is to set the pid = -1 in the relevant error
paths.
PR other/113957
libiberty/ChangeLog:
* pex-unix.c (pex_unix_exec_child): Set pid = -1 in the error
paths, since that is used to signal an erroneous outcome for
the routine.
Iain Sandoe [Tue, 30 Jan 2024 11:04:59 +0000 (11:04 +0000)]
aarch64: Register rng builtins with uint64_t pointers.
Currently, these are registered as unsigned_intDI_type_node which is not
necessarily the same type definition as uint64_t. On platforms where these
differ that causes fails in consuming the arm_acle.h header.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_rng_builtins):
Register these builtins with a pointer to uint64_t rather than unsigned
DI mode.
Thomas Schwinge [Fri, 16 Feb 2024 12:04:00 +0000 (13:04 +0100)]
GCN: Conditionalize 'define_expand "reduc_<fexpander>_scal_<mode>"' on '!TARGET_RDNA2_PLUS' [PR113615]
On top of commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
"amdgcn: add -march=gfx1030 EXPERIMENTAL" conditionalizing
'define_expand "reduc_<reduc_op>_scal_<mode>"' on
'!TARGET_RDNA2' (later: '!TARGET_RDNA2_PLUS'), we then did similar in
commit 7cc2262ec9a410dc56d1c1c6b950c922e14f621d
"gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]"
to conditionalize 'define_expand "fold_left_plus_<mode>"' on
'!TARGET_RDNA2_PLUS', but I found we also need to conditionalize the related
'define_expand "reduc_<fexpander>_scal_<mode>"' on '!TARGET_RDNA2_PLUS', to
avoid ICEs like:
Similar for 'gcc.dg/vect/vect-fmax-2.c', 'gcc.dg/vect/vect-fmin-2.c', and
'UNSPEC_SMAX_DPP_SHR' for 'gcc.dg/vect/vect-fmax-1.c', and
'UNSPEC_SMIN_DPP_SHR' for 'gcc.dg/vect/vect-fmin-1.c', when running 'vect.exp'
for 'check-gcc-c'.
When partially substituting a requires-expr, we don't want to perform
any additional checks beyond the substitution itself so as to minimize
checking requirements out of order. So don't check the return-type-req
of a compound-requirement during partial substitution. And don't check
the noexcept condition either since we can't do that on templated trees.
PR c++/113966
gcc/cp/ChangeLog:
* constraint.cc (tsubst_compound_requirement): Don't check
the noexcept condition or the return-type-requirement when
partially substituting.
The following tries to address the PHI insertion compile-time hog in
RTL fwprop observed with the PR54052 testcase where the loop computing
the "unfiltered" set of variables possibly needing PHI nodes for each
block exhibits quadratic compile-time and memory-use.
It does so by pruning the local DEFs with LR_OUT of the block, removing
regs that can never be LR_IN (defined by this block) in the dominance
frontier.
PR rtl-optimization/54052
* rtl-ssa/blocks.cc (function_info::place_phis): Filter
local defs by LR_OUT.
Gaius Mulley [Mon, 19 Feb 2024 12:59:36 +0000 (12:59 +0000)]
PR modula2/113889 Incorrect constant string value if declared in a definition module
This patch fixes a bug exposed when a constant string is declared in a
definition module and imported by a program module. The bug fix
was to defer the string assignment and concatenation until quadruples
were generated. The conststring symbol has a known field which
must be checked prior to retrieving the string contents.
gcc/m2/ChangeLog:
PR modula2/113889
* gm2-compiler/M2ALU.mod (StringFitsArray): Add tokeno parameter
to GetStringLength.
(InitialiseArrayOfCharWithString): Add tokeno parameter to
GetStringLength.
(CheckGetCharFromString): Add tokeno parameter to GetStringLength.
* gm2-compiler/M2Const.mod (constResolveViaMeta): Replace
PutConstString with PutConstStringKnown.
* gm2-compiler/M2GCCDeclare.mod (DeclareCharConstant): Add tokenno
parameter and add assert. Use tokenno to generate location.
(DeclareStringConstant): Add tokenno and add asserts.
Add tokenno parameter to calls to GetStringLength.
(PromoteToString): Add assert and add tokenno parameter to
GetStringLength.
(PromoteToCString): Add assert and add tokenno parameter to
GetStringLength.
(DeclareConstString): New procedure function.
(TryDeclareConst): Remove size local variable.
Check IsConstStringKnown.
Call DeclareConstString.
(PrintString): New procedure.
(PrintVerboseFromList): Call PrintString.
(CheckResolveSubrange): Check IsConstStringKnown before creating
subrange for char or issuing an error.
* gm2-compiler/M2GenGCC.mod (ResolveConstantExpressions): Add
StringLengthOp, StringConvertM2nulOp, StringConvertCnulOp case
clauses.
(FindSize): Add assert IsConstStringKnown.
(StringToChar): New variable tokenno.
Add tokenno parameter to GetStringLength.
(FoldStringLength): New procedure.
(FoldStringConvertM2nul): New procedure.
(FoldStringConvertCnul): New procedure.
(CodeAddr): Add tokenno parameter.
Replace CurrentQuadToken with tokenno.
Add tokenno parameter to GetStringLength.
(PrepareCopyString): Rewrite.
(IsConstStrKnown): New procedure function.
(FoldAdd): Detect conststring op2 and op3 which are known and
concat. Place result into op1.
(FoldStandardFunction): Pass tokenno as a parameter to
GetStringLength.
(CodeXIndr): Rewrite comment.
Rename op1 to left, op3 to right.
Pass rightpos to GetStringLength.
* gm2-compiler/M2Quads.def (QuadrupleOp): Add
StringConvertCnulOp, StringConvertM2nulOp and StringLengthOp.
* gm2-compiler/M2Quads.mod (import): Remove MakeConstLitString.
Add CopyConstString and PutConstStringKnown.
(IsInitialisingConst): Add StringConvertCnulOp,
StringConvertM2nulOp and StringLengthOp.
(callRequestDependant): Replace MakeConstLitString with
MakeConstString.
(DeferMakeConstStringCnul): New procedure function.
(DeferMakeConstStringM2nul): New procedure function.
(CheckParameter): Add early return if the string const is unknown.
(DescribeType): Add token parameter to GetStringLength.
Check for IsConstStringKnown.
(ManipulateParameters): Use DeferMakeConstStringCnul and
DeferMakeConstStringM2nul.
(MakeLengthConst): Remove and replace with...
(DeferMakeLengthConst): ... this.
(doBuildBinaryOp): Create ConstString and set it to contents
unknown.
Check IsConstStringKnown before generating error message.
(WriteQuad): Add StringConvertCnulOp, StringConvertM2nulOp and
StringLengthOp.
(WriteOperator): Add StringConvertCnulOp, StringConvertM2nulOp and
StringLengthOp.
* gm2-compiler/M2SymInit.mod (CheckReadBeforeInitQuad): Add
StringConvertCnulOp, StringConvertM2nulOp and StringLengthOp.
* gm2-compiler/NameKey.mod (LengthKey): Allow NulName to return 0.
* gm2-compiler/P2SymBuild.mod (BuildString): Replace
MakeConstLitString with MakeConstString.
(DetermineType): Replace PutConstString with PutConstStringKnown.
* gm2-compiler/SymbolTable.def (MakeConstVar): Tidy up comment.
(MakeConstLitString): Remove.
(MakeConstString): New procedure function.
(MakeConstStringCnul): New procedure function.
(MakeConstStringM2nul): New procedure function.
(PutConstStringKnown): New procedure.
(CopyConstString): New procedure.
(IsConstStringKnown): New procedure function.
(IsConstStringM2): New procedure function.
(IsConstStringC): New procedure function.
(IsConstStringM2nul): New procedure function.
(IsConstStringCnul): New procedure function.
(GetStringLength): Add token parameter.
(PutConstString): Remove.
(GetConstStringM2): Remove.
(GetConstStringC): Remove.
(GetConstStringM2nul): Remove.
(GetConstStringCnul): Remove.
(MakeConstStringC): Remove.
* gm2-compiler/SymbolTable.mod (SymConstString): Remove
M2Variant, NulM2Variant, CVariant, NulCVariant.
Add Known.
(CheckAnonymous): Replace $$ with __anon.
(IsNameAnonymous): Replace $$ with __anon.
(MakeConstVar): Detect whether the name is nul and treat as
a temporary constant.
(MakeConstLitString): Remove.
(BackFillString): Remove.
(InitConstString): Rewrite.
(GetConstStringM2): Remove.
(GetConstStringC): Remove.
(GetConstStringContent): New procedure function.
(GetConstStringM2nul): Remove.
(GetConstStringCnul): Remove.
(MakeConstStringCnul): Rewrite.
(MakeConstStringM2nul): Rewrite.
(MakeConstStringC): Remove.
(MakeConstString): Rewrite.
(PutConstStringKnown): New procedure.
(CopyConstString): New procedure.
(PutConstString): Remove.
(IsConstStringKnown): New procedure function.
(IsConstStringM2): New procedure function.
(IsConstStringC): Rewrite.
(IsConstStringM2nul): Rewrite.
(IsConstStringCnul): Rewrite.
(GetConstStringKind): New procedure function.
(GetString): Check Known.
(GetStringLength): Add token parameter and check Known.
gcc/testsuite/ChangeLog:
PR modula2/113889
* gm2/pim/run/pass/pim-run-pass.exp: Add filter for
constdef.mod.
* gm2/extensions/run/pass/callingc2.mod: New test.
* gm2/extensions/run/pass/callingc3.mod: New test.
* gm2/extensions/run/pass/callingc4.mod: New test.
* gm2/extensions/run/pass/callingc5.mod: New test.
* gm2/extensions/run/pass/callingc6.mod: New test.
* gm2/extensions/run/pass/callingc7.mod: New test.
* gm2/extensions/run/pass/callingc8.mod: New test.
* gm2/extensions/run/pass/fixedarray.mod: New test.
* gm2/extensions/run/pass/fixedarray2.mod: New test.
* gm2/pim/run/pass/constdef.def: New test.
* gm2/pim/run/pass/constdef.mod: New test.
* gm2/pim/run/pass/testimportconst.mod: New test.
Iain Buclaw [Mon, 19 Feb 2024 10:33:16 +0000 (11:33 +0100)]
d: Add UTF BOM tests to gdc.dg testsuite
Some of these are part of the upstream DMD `gdc.test' testsuite, but
they had been omitted because they get mangled by the lib/gdc-utils.exp
helpers when parsing and staging the tests. Translate them over to the
gdc.dg testsuite instead.
gcc/testsuite/ChangeLog:
* gdc.dg/bom_UTF16BE.d: New test.
* gdc.dg/bom_UTF16LE.d: New test.
* gdc.dg/bom_UTF32BE.d: New test.
* gdc.dg/bom_UTF32LE.d: New test.
* gdc.dg/bom_UTF8.d: New test.
* gdc.dg/bom_characters.d: New test.
* gdc.dg/bom_error_UTF8.d: New test.
* gdc.dg/bom_infer_UTF16BE.d: New test.
* gdc.dg/bom_infer_UTF16LE.d: New test.
* gdc.dg/bom_infer_UTF32BE.d: New test.
* gdc.dg/bom_infer_UTF32LE.d: New test.
* gdc.dg/bom_infer_UTF8.d: New test.
Jakub Jelinek [Mon, 19 Feb 2024 08:42:22 +0000 (09:42 +0100)]
match.pd: Fix ICE on BIT_INSERT_EXPR of BIT_FIELD_REF folding [PR113967]
The following testcase ICEs, because BIT_FIELD_REF's position is not
multiple of the vector element's bit size and the code uses exact_div
to divide those 2 values.
For BIT_INSERT_EXPR, the tree-cfg.cc verification verifies the position
is a multiple of the inserted bit size when inserting into vectors,
but for BIT_FIELD_REF the position can be arbitrary if within the range.
The following patch fixes that.
2024-02-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113967
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): Require
in condition that @rpos is multiple of vector element size.
Juzhe-Zhong [Thu, 1 Feb 2024 09:02:52 +0000 (17:02 +0800)]
RISC-V: Suppress the vsetvl fusion for conflict successors
Update in v2: Add dump information.
This patch fixes the following ineffective vsetvl insertion:
void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2)
{
for (size_t i = 0; i < n; i++)
{
if (i == cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
} else if (i == cond2) {
vfloat32mf2_t v = *(vfloat32mf2_t*)(in + i + 200);
*(vfloat32mf2_t*)(out + i + 200) = v;
} else if (i == (cond2 - 1)) {
vuint16mf2_t v = *(vuint16mf2_t*)(in + i + 300);
*(vuint16mf2_t*)(out + i + 300) = v;
} else {
vint8mf4_t v = *(vint8mf4_t*)(in + i + 400);
*(vint8mf4_t*)(out + i + 400) = v;
}
}
}
Dimitar Dimitrov [Thu, 15 Feb 2024 19:02:37 +0000 (21:02 +0200)]
testsuite: Mark non-optimized variants as expensive
When not optimized for speed, the test for PR112344 takes several
seconds to execute on native x86_64, and 15 minutes on PRU target
simulator. Thus mark those variants as expensive. The -O2 variant
which originally triggered the PR is not expensive, hence it is
still run by default.
PR middle-end/112344
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr112344.c: Run non-optimized variants only
if expensive tests are allowed.
Jerry DeLisle [Sat, 17 Feb 2024 17:24:58 +0000 (09:24 -0800)]
libgfortran: [PR105473] Fix checks for decimal='comma'.
PR libfortran/105473
libgfortran/ChangeLog:
* io/list_read.c (eat_separator): Reject comma as a
seprator when it is being used as a decimal point.
(parse_real): Reject a '.' when is should be a comma.
(read_real): Likewise.
* io/read.c (read_f): Add more checks for ',' and '.'
conditions.
The r14-870 changes broke xtb package tests (reduced testcase is the first
one below) and caused ICEs on a test derived from that (the second one).
For the
x = T(u = trim (us(1)))
statement, before that change gfortran used to emit weird code with
2 trim calls:
_gfortran_string_trim (&len.2, (void * *) &pstr.1, 20, &us[0]);
if (len.2 > 0)
{
__builtin_free ((void *) pstr.1);
}
D.4275 = len.2;
t.0.u = (character(kind=1)[1:0] *) __builtin_malloc (MAX_EXPR <(sizetype) D.4275, 1>);
t.0._u_length = D.4275;
_gfortran_string_trim (&len.4, (void * *) &pstr.3, 20, &us[0]);
(void) __builtin_memcpy ((void *) t.0.u, (void *) pstr.3, (unsigned long) NON_LVALUE_EXPR <len.4>);
if (len.4 > 0)
{
__builtin_free ((void *) pstr.3);
}
That worked at runtime, though it is wasteful.
That commit changed it to:
slen.3 = len.2;
t.0.u = (character(kind=1)[1:0] *) __builtin_malloc (MAX_EXPR <(sizetype) slen.3, 1>);
t.0._u_length = slen.3;
_gfortran_string_trim (&len.2, (void * *) &pstr.1, 20, &us[0]);
(void) __builtin_memcpy ((void *) t.0.u, (void *) pstr.1, (unsigned long) NON_LVALUE_EXPR <len.2>);
if (len.2 > 0)
{
__builtin_free ((void *) pstr.1);
}
which results in -Wuninitialized warning later on and if one is unlucky and
the uninitialized len.2 variable is smaller than the trimmed length, it
results in heap overflow and often crashes later on.
The bug above is clear, len.2 is only initialized in the
_gfortran_string_trim (&len.2, (void * *) &pstr.1, 20, &us[0]);
call, but used before that. Now, the
slen.3 = len.2;
t.0.u = (character(kind=1)[1:0] *) __builtin_malloc (MAX_EXPR <(sizetype) slen.3, 1>);
t.0._u_length = slen.3;
statements come from the alloc_scalar_allocatable_subcomponent call,
while
_gfortran_string_trim (&len.2, (void * *) &pstr.1, 20, &us[0]);
from the gfc_conv_expr (&se, expr); call which is done before the
alloc_scalar_allocatable_subcomponent call, but is only appended later on
with gfc_add_block_to_block (&block, &se.pre);
Now, obviously the alloc_scalar_allocatable_subcomponent emitted statements
can depend on the se.pre sequence statements which can compute variables
used by alloc_scalar_allocatable_subcomponent like the length.
On the other side, I think the se.pre sequence really shouldn't depend
on the changes done by alloc_scalar_allocatable_subcomponent, that is
initializing the FIELD_DECLs of the destination allocatable subcomponent
only, the gfc_conv_expr statements are already created, so all they could
in theory depend above is on t.0.u or t.0._u_length, but I believe if the
rhs dependened on the lhs content (which is allocated by those statements
but really uninitialized), it would need to be discovered by the dependency
analysis and forced into a temporary.
So, in order to fix the first testcase, the second hunk of the patch just
emits the se.pre block before the alloc_scalar_allocatable_subcomponent
changes rather than after it.
The second problem is an ICE on the second testcase. expr in the caller
(expr2 inside of alloc_scalar_allocatable_subcomponent) has
expr2->ts.u.cl->backend_decl already set, INTEGER_CST 20, but
alloc_scalar_allocatable_subcomponent overwrites it to a new VAR_DECL
which it assigns a value to before the malloc. That can work if the only
places the expr2->ts is ever used are in the same local block or its
subblocks (and only if it is dominated by the code emitted by
alloc_scalar_allocatable_subcomponent, so e.g. not if that call is inside
of a conditional code and use later unconditional), but doesn't work
if expr2->ts is used before that block or after it. So, the exact ICE is
because of:
slen.1 = 20;
static character(kind=1) us[1][1:20] = {"foo "};
x.u = 0B;
x._u_length = 0;
{
struct t t.0;
struct t D.4308;
{
integer(kind=8) slen.1;
slen.1 = 20;
t.0.u = (character(kind=1)[1:0] *) __builtin_malloc (MAX_EXPR <(sizetype) slen.1, 1>);
t.0._u_length = slen.1;
(void) __builtin_memcpy ((void *) t.0.u, (void *) &us[0], 20);
}
where the first slen.1 = 20; is emitted because it sees us has a VAR_DECL
ts.u.cl->backend_decl and so it wants to initialize it to the actual length.
This is invalid GENERIC, because the slen.1 variable is only declared inside
of a {} later on and so uses outside of it are wrong. Similarly wrong would
be if it is used later on. E.g. in the same testcase if it has
type(T) :: x, y
x = T(u = us(1))
y%u = us(1)
then there is
{
integer(kind=8) slen.1;
slen.1 = 20;
t.0.u = (character(kind=1)[1:0] *) __builtin_malloc (MAX_EXPR <(sizetype) slen.1, 1>);
t.0._u_length = slen.1;
(void) __builtin_memcpy ((void *) t.0.u, (void *) &us[0], 20);
}
...
if (y.u != 0B) goto L.1;
y.u = (character(kind=1)[1:0] *) __builtin_malloc (MAX_EXPR <(sizetype) slen.1, 1>);
i.e. another use of slen.1, this time after slen.1 got out of scope.
I really don't understand why the code modifies
expr2->ts.u.cl->backend_decl, expr2 isn't used there anywhere except for
expr2->ts.u.cl->backend_decl expressions, so hacks like save the previous
value, overwrite it temporarily over some call that will use expr2 and
restore afterwards aren't needed - there are no such calls, so the
following patch fixes it just by not messing up with
expr2->ts.u.cl->backend_decl, only set it to size variable and overwrite
that with a temporary if needed.
2024-02-17 Jakub Jelinek <jakub@redhat.com>
PR fortran/113503
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Don't
overwrite expr2->ts.u.cl->backend_decl, instead set size to
expr2->ts.u.cl->backend_decl first and use size instead of
expr2->ts.u.cl->backend_decl.
(gfc_trans_subcomponent_assign): Emit se.pre into block
before calling alloc_scalar_allocatable_subcomponent instead of
after it.
* gfortran.dg/pr113503_1.f90: New test.
* gfortran.dg/pr113503_2.f90: New test.
Marek Polacek [Thu, 15 Feb 2024 22:07:43 +0000 (17:07 -0500)]
c++: wrong looser excep spec for dep noexcept [PR113158]
Here we find ourselves in maybe_check_overriding_exception_spec in
a template context where we can't instantiate a dependent noexcept.
That's OK, but we have to defer the checking otherwise we give wrong
errors.
PR c++/113158
gcc/cp/ChangeLog:
* search.cc (maybe_check_overriding_exception_spec): Defer checking
when a noexcept couldn't be instantiated & evaluated to false/true.
std::__niter_base is used in _GLIBCXX_DEBUG mode to remove _Safe_iterator<>
wrapper on random access iterators. But doing so it should also preserve original
behavior to remove __normal_iterator wrapper.
libstdc++-v3/ChangeLog:
* include/bits/stl_algobase.h (std::__niter_base): Redefine the overload
definitions for __gnu_debug::_Safe_iterator.
* include/debug/safe_iterator.tcc (std::__niter_base): Adapt declarations.
Jakub Jelinek [Sat, 17 Feb 2024 08:25:59 +0000 (09:25 +0100)]
testsuite: Fix up lra effective target
Given the recent discussions on IRC started with Andrew P. mentioning that
an asm goto outputs test should have { target lra } and the lra effective
target in GCC 11/12 only returning 0 for PA and in 13/14 for PA/AVR, while
we clearly have 14 other targets which don't support LRA and a couple of
further ones which have an -mlra/-mno-lra switch (whatever default they
have), seems to me the effective target is quite broken.
The following patch rewrites it, such that it has a fast path for heavily
used targets which are for years known to use only LRA (just an
optimization) plus determines whether it is a LRA target or reload target
by scanning the -fdump-rtl-reload-details dump on an empty function,
LRA has quite a few always emitted messages in that case while reload has
none of those.
Tested on x86_64-linux and cross to s390x-linux, for the latter with both
make check-gcc RUNTESTFLAGS='--target_board=unix/-mno-lra dg.exp=pr107385.c'
where the test is now UNSUPPORTED and
make check-gcc RUNTESTFLAGS='--target_board=unix/-mlra dg.exp=pr107385.c'
where it fails because I don't have libc around.
There is one special case, NVPTX, which is a TARGET_NO_REGISTER_ALLOCATION
target. I think claiming for it that it is a lra target is strange (even
though it effectively returns true for targetm.lra_p ()), unsure if it
supports asm goto with outputs or not, if it does and we want to test it,
perhaps we should introduce asm_goto_outputs effective target and use
lra || nvptx-*-* for that?
2024-02-17 Jakub Jelinek <jakub@redhat.com>
* lib/target-supports.exp (check_effective_target_lra): Rewrite
to list some heavily used always LRA targets and otherwise check the
-fdump-rtl-reload-details dump for messages specific to LRA.
Fix a typo in __gthr_win32_abs_to_rel_time that caused it to return a
relative time in seconds instead of milliseconds. As a consequence,
__gthr_win32_cond_timedwait called SleepConditionVariableCS with a
1000x shorter timeout; this caused ~1000x more spurious wakeups in
CV timed waits such as std::condition_variable::wait_for or wait_until,
resulting generally in much higher CPU usage.
{
// timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, true);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
{
// non-timed wait, wake up explicitly after 1 second
std::thread t(thread_fn, false);
std::this_thread::sleep_for(std::chrono::seconds(1));
{
std::unique_lock<std::mutex> ml(mx);
pass = true;
}
cv.notify_all();
t.join();
}
return 0;
}
```
On builds based on non-affected threading models (e.g. POSIX on Linux,
or winpthreads or MCF on Win32) the output is something like
```
pass: 0; wakeups: 2; elapsed: 2000 ms
pass: 1; wakeups: 2; elapsed: 991 ms
pass: 1; wakeups: 2; elapsed: 996 ms
```
while with the Win32 threading model we get
```
pass: 0; wakeups: 1418; elapsed: 2000 ms
pass: 1; wakeups: 479; elapsed: 988 ms
pass: 1; wakeups: 2; elapsed: 992 ms
```
(notice the huge number of wakeups in the timed wait cases only).
This commit fixes the conversion, adjusting the final division by
NSEC100_PER_SEC to use NSEC100_PER_MSEC instead (already defined in the
file and not used in any other place, so probably just a typo).
libgcc/ChangeLog:
PR libgcc/113850
* config/i386/gthr-win32-cond.c (__gthr_win32_abs_to_rel_time):
fix absolute timespec to relative milliseconds count
conversion (it incorrectly returned seconds instead of
milliseconds); this avoids spurious wakeups in
__gthr_win32_cond_timedwait
Andrew Pinski [Fri, 16 Feb 2024 21:26:30 +0000 (13:26 -0800)]
Add -Wstrict-aliasing to vector-struct-1.C testcase
As noticed by Marek Polacek in https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645836.html,
this testcase was not failing before without -Wstrict-aliasing so let's add that option.
Committed as obvious after testing to make sure the test is now testing with `-Wstrict-aliasing` and `-flto`.