Jonathan Wakely [Sat, 7 Mar 2026 11:24:29 +0000 (11:24 +0000)]
libstdc++: Make <meta> header compatible with -fno-asm
We currently only use 'asm' in .cc files (where we control the build
flags) and in the experimental::simd headers. We could just say that
-fno-asm is not compatible with libstdc++ and so using it is not
supported, but we can also just make this small change.
libstdc++-v3/ChangeLog:
* include/std/meta (exception::what, access_context::via): Use
__asm__ instead of asm keyword.
Fortran: Evaluate all functions on the source image.
Formerly pure elemental functions were evaluated in the caf_accessor.
This had so many dependencies that there was no benefit. Evaluate
every function on the calling side now, which has the benefit, that
only one temporary has to be created instead of argument many.
PR fortran/121043
gcc/fortran/ChangeLog:
* coarray.cc (check_add_new_component): Evaluate all functions
on the source image.
* trans-decl.cc (gfc_build_builtin_function_decls): The only
argument of team_number() is of type void* in the library ABI.
Jonathan Wakely [Tue, 25 Nov 2025 14:29:50 +0000 (14:29 +0000)]
libstdc++: Add platform wait functions for FreeBSD [PR120527]
This defines __platform_wait, __platform_notify, and
__platform_wait_until for FreeBSD, making use of the _umtx_op syscall.
The Linux versions of those functions only support 32-bit integers, but
the FreeBSD versions use the syscall for both 32-bit and 64-bit types,
as the _umtx_op supports both.
We also need to change __spin_impl because it currently assumes the
waitable at args._M_obj is always a __platform_wait_t. Because FreeBSD
supports waiting on both 32-bit and 64-bit integers, we need a
platform-specific function for loading a value from _M_obj. This adds a
new __platform_load function, which does an atomic load of the right
size. The Linux definition just loads an int, but for FreeBSD it depends
on _M_obj_size. We also need a generic version of the function for
platforms without __platform_wait, because __spin_impl is always used,
even when the __waitable_state contains a condition_variable.
arm: testsuite: Improve stability of tests for pr45701
These tests have always been a bit flaky, but I noticed that they were
often running with completely unsuitable options (eg expecting tailcalls
to happen on Thumb1). So I took the opportunity while fixing that to
improve their overall stability by removing most of the code that was
trying to push up the register pressure and replacing it with a simple asm
statement. Doing this has the added advantage that it removes the
issues that -mpure-code can cause since the test no-longer needs to
access global variables.
gcc/testsuite/ChangeLog:
* gcc.target/arm/pr45701-1.c: Rework test to avoid global
variables. Require arm_arch_v7a_thumb as the effective
target.
* gcc.target/arm/pr45701-2.c: Likewise.
* gcc.target/arm/pr45701-3.c: Likewise.
Paul Thomas [Mon, 9 Mar 2026 16:07:48 +0000 (16:07 +0000)]
Fortran: Fix invalid free for PDTs without LEN components [PR122902]
2026-03-09 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/122902
* expr.cc (has_parameterized_comps): Moved from trans-array.cc.
* gfortran.h : Add prototype for has_parameterized_comps.
* trans-array.cc : Move has_parameterized_comps to expr.cc.
* trans-expr.cc (gfc_trans_scalar_assign): Don't deep copy PDTs
unless they have parameterized components.
gcc/testsuite/
PR fortran/122902
* gfortran.dg/pdt_39.f03: Deallocate a_r4 and a_r8.
* gfortran.dg/pdt_86.f03: New test.
Martin Jambor [Mon, 9 Mar 2026 13:01:00 +0000 (14:01 +0100)]
contrib/filter-clang-warnings.py: Ignore a C++ compat warning in libiberty
When building GCC master with Clang, we were getting (and not
filtering out) warning:
/home/worker/buildworker/tiber-gcc-clang/build/libiberty/regex.c:3978:24: warning: implicit conversion from 'int' to enumeration type 'reg_errcode_t' is invalid in C++ [-Wimplicit-int-enum-cast]
The flag is a C++ compatibility warning and libiberty is a C library, so
I think we can ignore the warning.
contrib/ChangeLog:
2026-03-09 Martin Jambor <mjambor@suse.cz>
* filter-clang-warnings.py (skip_warning): Also ingore
-Wimplicit-int-enum-cast in libiberty.
Martin Jambor [Mon, 9 Mar 2026 11:45:26 +0000 (12:45 +0100)]
ipa-cp: Allow more precise contexts in the verifier (PR124291)
Similarly to PR123629, the issue again stems from that when propagating
polymorphic contexts, when there are no known "values" in the corresponding
lattice of the caller we use just the information on the edge and when
there are some we combine them with the information, and from the fact that
we iterate the propagation in strongly connected components of the call
graph (SCCs).
In the first iteration over such SCC, we process the edge from
unmark/1097720 to onChild/1097719 before we determined the lattices of the
caller. In the second iteration, we already know what context there will
be in the first unmarks's parameter and so can add a more precise value to
the corresponding lattice of onChild. Because we always add values to the
lattices and never "improve" them, we get two values for the call.
In PR123629, that actually described reality well because the caller's
lattice had the variable flag set, without cloning the caller we could not
assume we could use the caller's lattice value and so both values were
possible (and in fact both cases happened, the problem was that their meet
failed).
In this case however, one could argue that the lattices contain wrong info
or at least information that is misleading because the caller's lattice
contains just that single "constant" and the variable flag is not set. We
know that regardless of cloning decisions for the caller the more precise
derived value will be the case. And indeed since I changed the cloning
code to re-gather all constants for the given set of callers, that code
arrives at the more precise context. For the record they only differ in
the fact that the more precise one has the dynamic flag cleared.
I have thought about how to fix up the lattices in one way or another but
so far it has always turned ugly. Therefore this patch simply changes the
verification to simply allow this situation because even though the final
result is just a bit more precise than what was expected, it is however
correct. There will not be any attempt to clone for the more precise
context because all the call graph edges will have been redirected away.
The only "issue" is that the less precise contexts take up place in the
lattice, which has a limited length. That should not be a problem in
practice.
gcc/ChangeLog:
2026-03-06 Martin Jambor <mjambor@suse.cz>
PR ipa/124291
* ipa-cp.cc (ipcp_val_replacement_ok_p): Allow more precise
contexts that what the clone was originally intended for.
Dhruv Chawla [Thu, 12 Feb 2026 08:27:49 +0000 (08:27 +0000)]
gcc-auto-profile: Force writing perf output to perf.data [PR124075]
This is a partial fix for PR124075 which forces perf record to write the
profile out to perf.data. This is required because I noticed on both
aarch64 and x86 that it was writing out the profile to stdout at times,
which would cause profile information to be dropped. This did not fail
in the various create_fdas_for_* targets because they would only try and
access the perf profiles if they existed at their paths.
Further work for this patch is to plumb the PERF_DATA make variable into
the perf record invocation, but that is a stage 1 thing.
Autoprofilebootstrapped and regtested on aarch64-linux-gnu.
Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
gcc/ChangeLog:
PR gcov-profile/124075
* config/aarch64/gcc-auto-profile: Add "-o perf.data" to perf
record invocation.
* config/i386/gcc-auto-profile: Likewise.
Dhruv Chawla [Thu, 12 Feb 2026 03:57:51 +0000 (03:57 +0000)]
gcc-auto-profile: Force --inherit flag for perf record [PR123923]
This works around a bug I was seeing while testing autoprofiledbootstrap
where it appeared that perf record was only recording the make
invocation and not any of the child processes. I did not find any
configuration that would make it do so, so forcing the flag in
gcc-auto-profile will make sure that it doesn't happen regardless of
perf's settings.
Bootstrapped and regtested on aarch64-linux-gnu.
Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com>
gcc/ChangeLog:
PR gcov-profile/123923
* config/aarch64/gcc-auto-profile: Add --inherit to perf record flags.
* config/i386/gcc-auto-profile: Likewise.
Sorry to be awkward, but I'd like to revert the rtlanal.cc and
config/mips/mips.md parts of r16-7265-ga9e48eca3a6eef. I think
the expr.cc part of that patch is enough to fix the bug. The other
parts seem unnecessary and are likely to regress code quality on MIPS
compared to previous releases. (See the testing below for examples.)
The rtlanal.cc part added the following code to truncated_to_mode:
/* This explicit TRUNCATE may be needed on targets that require
MODE to be suitably extended when stored in X. Targets such as
mips64 use (sign_extend:DI (truncate:SI (reg:DI x))) to perform
an explicit extension, avoiding use of (subreg:SI (reg:DI x))
which is assumed to already be extended. */
scalar_int_mode imode, omode;
if (is_a <scalar_int_mode> (mode, &imode)
&& is_a <scalar_int_mode> (GET_MODE (x), &omode)
&& targetm.mode_rep_extended (imode, omode) != UNKNOWN)
return false;
I think this has two problems. The first is that mode_rep_extended
describes a canonical form that is obtained by correctly honouring
TARGET_TRULY_NOOP_TRUNCATION. It is not an independent restriction
on what RTL optimisers can do. If we need to disable an optimisation
on MIPS-like targets, the restrictions should be based on
TARGET_TRULY_NOOP_TRUNCATION instead.
The second problem is that, although the comment treats MIPS-like
DI->SI truncation as a special case, truncated_to_mode is specifically
written for such cases. The comment above the function says:
/* Suppose that truncation from the machine mode of X to MODE is not a
no-op. See if there is anything special about X so that we can
assume it already contains a truncated value of MODE. */
Thus we're already in the realm of MIPS-like truncations that need
TRUNCATE rather than SUBREG (and that in turn guarantee sign-extension
in some cases). It's the caller that checks for that condition:
So I think the patch has the effect of disabling exactly the kind of
optimisation that truncated_to_mode is supposed to provide.
truncated_to_mode makes an implicit assumption that sign-extension is
enough to allow a SUBREG to be used in place of a TRUNCATE. This is
true for MIPS and was true for the old SH64 port. I don't know whether
it's true for gcn and nvptx, although I assume that it must be, since
no-one seems to have complained. However, it would not be true for a
port that required zero rather than sign extension (which AFAIK we've
never had).
It's probably worth noting that this assumption is in the opposite
direction from what mode_rep_extended describes. mode_rep_extended
says that "proper" truncation leads to a guarantee of sign extension.
truncated_for_mode assumes that sign extension avoids the need for
"proper" truncation. On MIPS, the former is only true for truncation
from 64 bits to 32 bits, whereas the latter is true for all cases (such
as 64 bits to 16 bits).
The old :SHORT pattern existed because QI and HI values are only
guaranteed to be sign-extensions of bit 31 of the register, not bits
7 or 15 (respectively). Thus we have the worst of both worlds:
(1) truncation from DI is not a nop. It requires a left shift by
at least 32 bits and a right shift by the same amount.
(2) sign extension to DI is not a nop. It requires a left shift and
a right shift in the normal way (by 56 bits for QI and 48 bits
for HI).
So a separate truncation and extension would yield four shifts.
The pattern above exists to reduce this to two shifts, since (2)
subsumes (1).
But the :SI case is different:
(1) truncation from DI is not a nop. It requires a left shift by 32
and a right shift by 32, as above.
(2) sign extension from SI to DI is a nop.
(2) is implemented by:
;; When TARGET_64BIT, all SImode integer and accumulator registers
;; should already be in sign-extended form (see TARGET_TRULY_NOOP_TRUNCATION
;; and truncdisi2). We can therefore get rid of register->register
;; instructions if we constrain the source to be in the same register as
;; the destination.
;;
;; Only the pre-reload scheduler sees the type of the register alternatives;
;; we split them into nothing before the post-reload scheduler runs.
;; These alternatives therefore have type "move" in order to reflect
;; what happens if the two pre-reload operands cannot be tied, and are
;; instead allocated two separate GPRs. We don't distinguish between
;; the GPR and LO cases because we don't usually know during pre-reload
;; scheduling whether an operand will be LO or not.
(define_insn_and_split "extendsidi2"
[(set (match_operand:DI 0 "register_operand" "=d,l,d")
(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))]
"TARGET_64BIT"
"@
#
#
lw\t%0,%1"
"&& reload_completed && register_operand (operands[1], VOIDmode)"
[(const_int 0)]
{
emit_note (NOTE_INSN_DELETED);
DONE;
}
[(set_attr "move_type" "move,move,load")
(set_attr "mode" "DI")])
So extending the first pattern above from :SHORT to :SUBDI is not really
an optimisation, in the sense that it doesn't add new information.
Not providing the combination allows the truncation or sign-extension
to be optimised with surrounding code.
I suppose the argument in favour of going from :SHORT to :SUBDI is
that it might avoid a move in some cases. But (a) I think that would
need to be measured further, (b) it might instead mean that the
extendsidi2 pattern needs to be tweaked for modern RA choices,
and (c) it doesn't really feel like stage 4 material.
I can understand where the changes came from. The output of combine
was clearly wrong before r16-7265-ga9e48eca3a6eef. And what combine
did looked bad. But I don't think combine itself did anything wrong.
IMO, all it did was expose the problems in the existing RTL. Expand
dropped a necessary sign-extension and the rest flowed from there.
In particular, the old decisions based on truncated_to_mode seemed
correct. The thing that the truncated_to_mode patch changed was the
assumption that a 64-bit register containing a "u16 lower" parameter
could be truncated with a SUBREG. And that's true, since it's
guaranteed by the ABI. The parameter is zero-extended from bit 16
and so the register contains a sign extension of bit 16 (i.e. 0).
And that was the information that truncated_to_mode was using.
I tested the patch on mips64-linux-gnu (all 3 ABIs). The patch fixes
regressions in:
a68: fix calls to strtol and stroll [PR algol68/124372]
This commit fixes the following problems related to parsing integer
and bits denotations:
1. strtou?l should be used only if itis 64-bit long. Otherwise, use
strtou?l.
2. Use unsigned conversions for bits denotations radix, for
consistency.
Tested in i686-linux-gnu and x86_64-linux-gnu.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
PR algol68/124372
* a68-low-units.cc (a68_lower_denotation): Call to strtoull if
INT64_T_IS_LONG is not defined, strtol otherwise.
* a68-parser-scanner.cc (get_next_token): Use strtoul for radix
instead of strtol.
doc: Move specs documentation to GCC internals manual [PR69367] [PR69849]
The description of specs should have ended up in the GCC internals
manual instead of the user-facing documentation when the two manuals
were split many years ago.
Starting with C++11 we leverage on template parameter requirement to prevent
instantiation of methods taking iterators with invalid types.
So the _GLIBCXX_DEBUG mode do not need to check for potential ambiguity between
integer type and iterator type anymore.
Jørgen Kvalsvik [Sat, 7 Mar 2026 09:36:40 +0000 (10:36 +0100)]
Improve speed of masking table algorithm for MC/DC
The masking table was computed by considering the cartesian product of
incoming edges, ordering the pairs, and doing upwards BFS searches
from the sucessors of the lower topologically index'd ones (higher in
the graph). The problem with this approach is that all the nodes we
find from the higher candidates would also be found from the lower
candidates, and since we want to collect the set intersection, any
higher candidate would be dominated by lower candidates.
We need only consider adjacent elements in the sorted set of
candidates. This has a dramatic performance impact for large
functions. The worst case is expressions on the form (x && y && ...)
and (x || y || ...) with up-to 64 elements. I did a wallclock
comparison of the full analysis phase (including emitting the GIMPLE):
test.c:
int fn (int a[])
{
(a[0] && a[1] && ...) // 64 times
(a[0] && a[1] && ...) // 64 times
... // 500 times
}
int main ()
{
int a[64];
for (int i = 0; i != 10000; ++i)
{
for (int k = 0; k != 64; ++k)
a[k] = i % k;
fn1 (a);
}
}
Without this patch:
fn1 instrumented in 20822.303 ms (41.645 ms per expression)
With this patch:
fn1 instrumented in 1288.548 ms (2.577 ms per expression)
I also tried considering terms left-to-right and, whenever the search
found an already-processed expression it would stop the search and
just insert its complete table entry, but this had no measurable
impact on compile time, and the result was a slightly more complicated
function.
This inefficiency went unnoticed for a while, because these
expressions aren't very common. The most I've seen in the wild is 27
conditions, and that involved a lot of nested expressions which aren't
impacted as much.
gcc/ChangeLog:
* tree-profile.cc (struct conds_ctx): Add edges.
(topological_src_cmp): New function.
(masking_vectors): New search strategy.
We find (const_int 3 [0x3]) and a few others to be equivalent, among
them (reg:QI v1). This is a "fake set" that we create to help CSE extract
const_vector elements and reuse them. Element 0 is special, though.
We lowpart-subreg simplify it to (reg:QI v1) directly and, as the register
stays the same, consider it equivalent to (reg:V8QI v1).
Because both equivs refer to the same hard reg, in merge_equiv_classes, the
old (reg:V8QI) equiv is deleted and replaced by the new (reg:QI) one,
forgetting that the old equiv had 7 more elements.
Subsequently, extracting element 1 of a zero-extended QImode register results
in "0" instead of the correct "-4".
Therefore, this patch only uses those vec_select simplification that do
not directly result in a register.
PR rtl-optimization/121649
gcc/ChangeLog:
* cse.cc (find_sets_in_insn): Only use non-reg vec_select
simplifications.
Martin Uecker [Thu, 19 Feb 2026 17:20:01 +0000 (18:20 +0100)]
c: Fix ICE related to tags and hardbool attribute [PR123856]
The hardbool attribute creates special enumeration types,
but the tag is not set correctly, which causes broken diagnostics
and an ICE with the new helper function to get the tag.
David Malcolm [Fri, 6 Mar 2026 23:47:05 +0000 (18:47 -0500)]
testsuite: fix ICEs in analyzer plugin with CPython >= 3.11 [PR107646,PR112520]
In GCC 14 the testsuite gained a plugin that "teaches" the analyzer
about the CPython API, trying for find common mistakes:
https://gcc.gnu.org/wiki/StaticAnalyzer/CPython
Unfortunately, this has been crashing for more recent versions of
CPython.
Specifically, in Python 3.11, PyObject's ob_refcnt was moved to an
anonymous union (as part of PEP 683 "Immortal Objects, Using a Fixed
Refcount"). The plugin attempts to find the field but fails, but has
no error-handling, leading to a null pointer dereference.
Also, https://github.com/python/cpython/pull/101292 moved the "ob_digit"
from struct _longobject to a new field long_value of a new
struct _PyLongValue, leading to similar analyzer crashes when not
finding the field.
The following patch fixes this by
* looking within the anonymous union for the ob_refcnt field if it can't
find it directly
* gracefully handling the case of not finding "ob_digit" in PyLongObject
* doing more lookups once at plugin startup, rather than continuously on
analyzing API calls
* adding diagnostics and more error-handling to the plugin startup, so that
if it can't find something in the Python headers it emits a useful note
when disabling itself, e.g.
cc1: note: could not find field 'ob_digit' of CPython type 'PyLongObject' {aka 'struct _longobject'}
* replacing some copy-and-pasted code with member functions of a new
"class api" (though various other cleanups could be done)
Tested with:
* CPython 3.8: all tests continue to PASS
* CPython 3.13: fixes the ICEs, 2 FAILs remain (reference counting false
negatives)
Given that this is already a large patch, I'm opting to only fix the
crashes and defer the 2 remainings FAILs and other cleanups to followup
work.
gcc/analyzer/ChangeLog:
PR testsuite/112520
* region-model-manager.cc
(region_model_manager::get_field_region): Assert that the args are non-null.
gcc/testsuite/ChangeLog:
PR analyzer/107646
PR testsuite/112520
* gcc.dg/plugin/analyzer_cpython_plugin.cc: Move everything from
namespace ana:: into ana::cpython_plugin. Move global tree values
into a new "class api".
(pyobj_record): Replace with api.m_type_PyObject.
(pyobj_ptr_tree): Replace with api.m_type_PyObject_ptr.
(pyobj_ptr_ptr): Replace with api.m_type_PyObject_ptr_ptr.
(varobj_record): Replace with api.m_type_PyVarObject.
(pylistobj_record): Replace with api.m_type_PyListObject.
(pylongobj_record): Replace with api.m_type_PyLongObject.
(pylongtype_vardecl): Replace with api.m_vardecl_PyLong_Type.
(pylisttype_vardecl): Replace with api.m_vardecl_PyList_Type.
(get_field_by_name): Add "complain" param and use it to issue a
note on failure. Assert that type and name are non-null. Don't
crash on fields that are anonymous unions, and special-case
looking within them for "ob_refcnt" to work around the
Python 3.11 change for PEP 683 (immortal objects).
(get_sizeof_pyobjptr): Convert to...
(api::get_sval_sizeof_PyObject_ptr): ...this
(init_ob_refcnt_field): Convert to...
(api::init_ob_refcnt_field): ...this.
(set_ob_type_field): Convert to...
(api::set_ob_type_field): ..this.
(api::init_PyObject_HEAD): New.
(api::get_region_PyObject_ob_refcnt): New.
(api::do_Py_INCREF): New.
(api::get_region_PyVarObject_ob_size): New.
(api::get_region_PyLongObject_ob_digit): New.
(inc_field_val): Convert to...
(api::inc_field_val): ...this.
(refcnt_mismatch::refcnt_mismatch): Add tree params for refcounts
and initialize corresponding fields. Fix whitespace.
(refcnt_mismatch::emit): Use stored tree values, rather than
assuming we have constants, and crashing non-constants. Delete
commented-out dead code.
(refcnt_mismatch::foo): Delete.
(refcnt_mismatch::m_expected_refcnt_tree): New field.
(refcnt_mismatch::m_actual_refcnt_tree): New field.
(retrieve_ob_refcnt_sval): Simplify using class api.
(count_pyobj_references): Likewise.
(check_refcnt): Likewise. Don't warn on UNKNOWN values. Use
get_representative_tree for the expected and actual values and
skip the warning if it fails, rather than assuming we have
constants and crashing on non-constants.
(count_all_references): Update comment.
(kf_PyList_Append::impl_call_pre): Simplify using class api.
(kf_PyList_Append::impl_call_post): Likewise.
(kf_PyList_New::impl_call_post): Likewise.
(kf_PyLong_FromLong::impl_call_post): Likewise.
(get_stashed_type_by_name): Emit note if the type couldn't be
found.
(get_stashed_global_var_by_name): Likewise for globals.
(init_py_structs): Convert to...
(api::init_from_stashed_types): ...this. Bail out with an error
code if anything fails. Look up more things at startup, rather
than during analysis of calls.
(ana::cpython_analyzer_events_subscriber): Rename to...
(ana::cpython_plugin::analyzer_events_subscriber): ...this.
(analyzer_events_subscriber::analyzer_events_subscriber):
Initialize m_init_failed.
(analyzer_events_subscriber::on_message<on_tu_finished>):
Update for conversion of init_py_structs to
api::init_from_stashed_types and bail if it fails.
(analyzer_events_subscriber::on_message<on_frame_popped): Don't
run if plugin initialization failed.
(analyzer_events_subscriber::m_init_failed): New field.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Patrick Palka [Fri, 6 Mar 2026 22:59:11 +0000 (17:59 -0500)]
c++: ICE mangling C auto... tparm [PR124297]
After r16-7491, the constraint on a C auto... tparm is represented as a
fold-expression (in TEMPLATE_PARM_CONSTRAINTS) instead of a concept-id (in
PLACEHOLDER_TYPE_CONSTRAINTS). So we now need to strip this fold-expression
before calling write_type_constraint, like we do in the type template
parameter case a few lines below.
PR c++/124297
gcc/cp/ChangeLog:
* mangle.cc (write_template_param_decl) <case PARM_DECL>:
Strip fold-expression before calling write_type_constraint.
Andrew Pinski [Tue, 17 Feb 2026 22:03:44 +0000 (14:03 -0800)]
aarch64: Fix uint64_t[8] usage after including "arm_neon.h" [PR124126]
aarch64_init_ls64_builtins_types currently creates an array with type uint64_t[8]
and then sets the mode to V8DI. The problem here is if you used that array
type before, you would get a mode of BLK.
This causes an ICE in some cases, with the C++ front-end with -g, you would
get "type variant differs by TYPE_MODE" and in some cases even without -g,
"canonical types differ for identical types".
The fix is to do build_distinct_type_copy of the array in aarch64_init_ls64_builtins_types
before assigning the mode to that copy. We keep the same ls64 structures correct and
user provided arrays are not influenced when "arm_neon.h" is included.
Build and tested on aarch64-linux-gnu.
PR target/124126
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types): Copy
the array type before setting the mode.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/pr124126-1.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
the IR for p->c[12] is:
(.ACCESS_WITH_SIZE (p->c, &p->b, 0B, 4) + 48) = 2;
The current routine get_index_from_offset in c-family/c-ubsan.cc cannot
handle the integer constant offset "48" correctly.
The fix is to enhance "get_index_from_offset" to correctly handle the constant
offset.
PR c/124230
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_index_from_offset): Handle the special case when
the offset is an integer constant.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-char.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-float.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-struct.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-union.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230.c: New test.
Andrew Pinski [Fri, 6 Mar 2026 19:22:56 +0000 (11:22 -0800)]
c: Fix pragma inside a pragma [PR97991}
After r0-72806-gbc4071dd66fd4d, c_parser_consume_token will
assert if we get a pragma inside c_parser_consume_token but
pragma processing will call pragma_lex which then calls
c_parser_consume_token. In the case of pragma with expansion
(redefine_extname, message and sometimes pack [and some target
specific pragmas]) we get the expanded tokens that includes
CPP_PRAGMA. We should just allow it instead of doing an assert.
This follows what the C++ front-end does even and we no longer
have an ICE.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/97991
gcc/c/ChangeLog:
* c-parser.cc (c_parser_consume_token): Allow
CPP_PRAGMA if inside a pragma.
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/pr97991-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Saurabh Jha [Mon, 16 Feb 2026 14:11:58 +0000 (14:11 +0000)]
aarch64: mingw: Fix regression in C++ support
Fixes regression in C++ support without exception handling by:
1. Moving Makefile fragment config/i386/t-seh-eh to
config/mingw/t-seh-eh that handles C++ exception handling. This is
sufficient to fix the regression even if the exception handling
itself is not implemented yet.
2. Changing existing references of t-seh-eh in libgcc/config.host and
add it for aarch64-*-mingw*.
With these changes, the compiler can now be built with C and C++.
This doesn't add support for Structured Exception Handling (SEH)
which will be done separately.
libgcc/ChangeLog:
* config.host: Set tmake_eh_file for aarch64-*-mingw* and update
it for x86_64-*-mingw* and x86_64-*-cygwin*.
* config/i386/t-seh-eh: Move to...
* config/mingw/t-seh-eh: ...here.
* config/aarch64/t-no-eh: Removed.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mingw/mingw.exp: Add support for C++ files.
* gcc.target/aarch64/mingw/minimal_new_del.C: New test.
Jakub Jelinek [Fri, 6 Mar 2026 13:33:19 +0000 (14:33 +0100)]
testsuite: Add testcase for already fixed PR [PR122000]
This testcase started to be miscompiled with my r15-9131 change
on arm with -march=armv7-a -mfpu=vfpv4 -mfloat-abi=hard -O and got
fixed with r16-6548 PR121773 change.
2026-03-06 Jakub Jelinek <jakub@redhat.com>
PR target/122000
* gcc.c-torture/execute/pr122000.c: New test.
Nathan Myers [Fri, 6 Mar 2026 10:33:04 +0000 (05:33 -0500)]
libstdc++: bitset _GLIBCXX_ASSERTIONS op[] fixes
C++11 forbids a compound statement, as seen in the definition
of __glibcxx_assert(), in a constexpr function. This patch
open-codes the assertion in `bitset<>::operator[] const` for
C++11 to fix a failure in `g++.old-deja/g++.martin/bitset1.C`.
Also, it adds `{ dg-do compile }` in another test to suppress
a spurious UNRESOLVED complaint.
Code size tests on Arm are notoriously flaky because there are
numerous ISA variants (Arm, Thumb-1 and Thumb-2) to consider in
addition to a number of other variants from multiple sub-architecture
and micro-architectural tuning options. In combination this means
that we have continuous testsuite churn if the constraints are tight
enough to detect real regressions.
So this patch eliminates most of these checks, except where the code
size test is the only test that is done (other than the compilation
itself). Where that is the case I've tightened the compiler options
to limit the test to one set of architecture flags, thereby
eliminating most of the sources of variation.
In some cases I've replaced a code-size check with some other test of
the output, based on the intent of the original patch that motivated
the test. For example, the max-insns-skipped test now checks that an
IT instruction is not generated rather than checking the size of the
binary (which was a side-effect of not generating IT).
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add arm_arch_v7a_thumb.
* gcc.target/arm/ifcvt-size-check.c: Add options to force thumb1.
* gcc.target/arm/ivopts-2.c: Remove object size check.
* gcc.target/arm/ivopts-3.c: Likewise.
* gcc.target/arm/ivopts-4.c: Likewise.
* gcc.target/arm/ivopts-5.c: Likewise.
* gcc.target/arm/ivopts.c: Likewise.
* gcc.target/arm/max-insns-skipped.c: Scan for absence of an IT
instruction. Remove object size check. Use arm_arch_v7a_thumb.
* gcc.target/arm/pr43597.c: Remove object size check and use
arm_arch_v7a_thumb.
* gcc.target/arm/pr63210.c: Use arm_arch_v5t_thumb options.
* gcc.target/arm/split-live-ranges-for-shrink-wrap.c: Remove
object size check and use arm_arch_v5t_thumb options.
arm: testsuite: Fix typo on target arm_cpu_cortex_a53
When testing the effective target these tests were using the wrong
name since they omitted the trailing _ok. This was causing some tests
to fail to execute correclty.
gcc/testsuite/ChangeLog:
* gcc.target/arm/aes-fuse-1.c: Add _ok to the effective_target.
* gcc.target/arm/aes-fuse-2.c: Likewise.
Jonathan Wakely [Wed, 4 Mar 2026 10:54:16 +0000 (10:54 +0000)]
libstdc++: Use aligned new for filesystem::path internals [PR122300]
As Bug 122300 shows, we have at least one target where the
static_assert added by r16-4422-g1b18a9e53960f3 fails. This patch
resurrects the original proposal for using aligned new that I posted in
https://gcc.gnu.org/pipermail/libstdc++/2025-October/063904.html
Instead of just asserting that the memory from operator new will be
sufficiently aligned, check whether it will be and use aligned new if
needed. We don't just use aligned new unconditionally, because that can
add overhead on targets where malloc already meets the requirements.
libstdc++-v3/ChangeLog:
PR libstdc++/122300
* src/c++17/fs_path.cc (path::_List::_Impl): Remove
static_asserts.
(path::_List::_Impl::required_alignment)
(path::_List::_Impl::use_aligned_new): New static data members.
(path::_List::_Impl::create_unchecked): Check use_aligned_new
and use aligned new if needed.
(path::_List::_Impl::alloc_size): New static member function.
(path::_List::_Impl_deleter::operator): Check use_aligned_new
and use aligned delete if needed.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jakub Jelinek [Fri, 6 Mar 2026 09:32:00 +0000 (10:32 +0100)]
tree-inline: Fix up ICE on !is_gimple_reg is_gimple_reg_type copying [PR124135]
The first testcase below ICEs e.g. with -O2 on s390x-linux, the
second with -O2 -m32 on x86_64-linux. We have
<bb 2> [local count: 1073741824]:
if (x_4(D) != 0)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 4> [local count: 1073741824]:
return <retval>;
on a target where <retval> has gimple reg type but is
aggregate_value_p and TREE_ADDRESSABLE too.
fnsplit splits this into
<bb 2> [local count: 354334800]:
_1 = qux (42);
foo (0, &<retval>, _1);
<bb 3> [local count: 354334800]:
return <retval>;
in the *.part.0 function and
if (x_4(D) != 0)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 4> [local count: 1073741824]:
return <retval>;
in the original function. Now, dunno if already that isn't
invalid because <retval> has TREE_ADDRESSABLE set in the latter, but
at least it is accepted by tree-cfg.cc verification.
tree lhs = gimple_call_lhs (stmt);
if (lhs
&& (!is_gimple_reg (lhs)
&& (!is_gimple_lvalue (lhs)
|| verify_types_in_gimple_reference
(TREE_CODE (lhs) == WITH_SIZE_EXPR
? TREE_OPERAND (lhs, 0) : lhs, true))))
{
error ("invalid LHS in gimple call");
return true;
}
While lhs is not is_gimple_reg, it is is_gimple_lvalue here.
Now, inlining of the *.part.0 fn back into the original results
in
<retval> = a;
statement which already is diagnosed by verify_gimple_assign_single:
case VAR_DECL:
case PARM_DECL:
if (!is_gimple_reg (lhs)
&& !is_gimple_reg (rhs1)
&& is_gimple_reg_type (TREE_TYPE (lhs)))
{
error ("invalid RHS for gimple memory store: %qs", code_name);
debug_generic_stmt (lhs);
debug_generic_stmt (rhs1);
return true;
}
__float128/long double are is_gimple_reg_type, but both operands
aren't is_gimple_reg.
The following patch fixes it by doing separate load and store, i.e.
_42 = a;
<retval> = 42;
in this case. If we want to change verify_gimple_assign to disallow
!is_gimple_reg (lhs) for is_gimple_reg_type (TREE_TYPE (lhs)), we'd
need to change fnsplit instead, but I'd be afraid such a change would
be more stage1 material (and certainly nothing that should be
even backported to release branches).
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124135
* tree-inline.cc (expand_call_inline): If both gimple_call_lhs (stmt)
and use_retvar aren't gimple regs but have gimple reg type, use
separate load of use_retva into SSA_NAME and then store of it
into gimple_call_lhs (stmt).
* g++.dg/torture/pr124135-1.C: New test.
* g++.dg/torture/pr124135-2.C: New test.
Jakub Jelinek [Fri, 6 Mar 2026 07:14:09 +0000 (08:14 +0100)]
match.pd: Move cast into p+ operand for (ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z)) [PR124358]
The following testcase is miscompiled since my r12-6382 change, because
it doesn't play well with the gimple_fold_indirect_ref function which uses
STRIP_NOPS and then has
/* *(foo *)fooarrptr => (*fooarrptr)[0] */
if (TREE_CODE (TREE_TYPE (subtype)) == ARRAY_TYPE
&& TREE_CODE (TYPE_SIZE (TREE_TYPE (TREE_TYPE (subtype)))) == INTEGER_CST
&& useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (subtype))))
{
tree type_domain;
tree min_val = size_zero_node;
tree osub = sub;
sub = gimple_fold_indirect_ref (sub);
if (! sub)
sub = build1 (INDIRECT_REF, TREE_TYPE (subtype), osub);
type_domain = TYPE_DOMAIN (TREE_TYPE (sub));
if (type_domain && TYPE_MIN_VALUE (type_domain))
min_val = TYPE_MIN_VALUE (type_domain);
if (TREE_CODE (min_val) == INTEGER_CST)
return build4 (ARRAY_REF, type, sub, min_val, NULL_TREE, NULL_TREE);
}
Without the GENERIC
#if GENERIC
(simplify
(pointer_plus (convert:s (pointer_plus:s @0 @1)) @3)
(convert:type (pointer_plus @0 (plus @1 @3))))
#endif
we have INDIRECT_REF of POINTER_PLUS_EXPR with int * type of NOP_EXPR
to that type of POINTER_PLUS_EXPR with pointer to int[4] ARRAY_TYPE, so
gimple_fold_indirect_ref doesn't create the ARRAY_REF.
But with it, it is simplified to NOP_EXPR to int * type from
POINTER_PLUS_EXPR with pointer to int[4] ARRAY_TYPE, the NOP_EXPR is
skipped over by STRIP_NOPS and the above code triggers.
The following patch fixes it by swapping the order, do NOP_EXPR
inside of POINTER_PLUS_EXPR first argument instead of NOP_EXPR with
POINTER_PLUS_EXPR.
2026-03-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124358
* match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): Simplify
into (ptr) x p+ (y + z) instead.
Andrew Pinski [Fri, 6 Mar 2026 05:54:44 +0000 (21:54 -0800)]
testsuite/aarch64: Add testcae for already fixed bug [PR124078]
This big-endian testcase started to ICE with r16-7464-g560766f6e239a8
and then started to work r16-7506-g498983d9619351.
So it seems like a good idea to add the testcase for this
so it does not break again.
Pushed as obvious after a quick test to make sure it ICEd
before and it is passing now on aarch64-linux-gnu.
PR rtl-optimization/124078
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr124078-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Jakub Jelinek [Thu, 5 Mar 2026 20:43:55 +0000 (21:43 +0100)]
c++: Avoid caching TARGET_EXPR slot value if exception is thrown from TARGET_EXPR_INITIAL [PR124145]
The following testcase is miscompiled, we throw exception only during
the first bar () call and not during the second and in that case reach
the inline asm.
The problem is that the TARGET_EXPR handling calls
ctx->global->put_value (new_ctx.object, new_ctx.ctor);
first for aggregate/vectors, then
if (is_complex)
/* In case no initialization actually happens, clear out any
void_node from a previous evaluation. */
ctx->global->put_value (slot, NULL_TREE);
and then recurses on TARGET_EXPR_INITIAL.
Even for is_complex it can actually store partially the result in the
slot before throwing.
When TARGET_EXPR_INITIAL doesn't throw, we do
if (ctx->save_expr)
ctx->save_expr->safe_push (slot);
and that arranges for the value in slot be invalidated at the end of
surrounding CLEANUP_POINT_EXPR.
But in case when it does throw this isn't done.
The following patch fixes it by moving that push to save_expr
before the if (*jump_target) return NULL_TREE; check.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/124145
* constexpr.cc (cxx_eval_constant_expression) <case TARGET_EXPR>: Move
ctx->save_expr->safe_push (slot) call before if (*jump_target) test.
Use TARGET_EXPR_INITIAL instead of TREE_OPERAND.
a68: fix wrapping C functions returning void [PR algol68/124322]
This patch fixes a68_wrap_formal_proc_hole so it doesn't assume that
wrapped C functions returning void return Algol 68 void values, which
are empty records.
Tested in i686-linux-gnu and x86_64-linux-gnu.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
Alice Carlotti [Wed, 4 Mar 2026 14:58:21 +0000 (14:58 +0000)]
aarch64 libgcc: Fix mingw build [PR124333]
Make __aarch64_cpu_features unconditionally available. This permits the
unconditional use of this global inside __arm_get_current_vg, which was
introduced in r16-7637-g41b4a73f370116.
For now this global is only initialised when <sys/auxv.h> is available,
but we can extend this in future to support other ways of initialising
the bits used for SME support, and use this remove __aarch64_have_sme.
This approach was recently adopted by LLVM.
This patch does introduce an inconsistency with __aarch64_have_sme when
<sys/auxv.h> is unavailable. However, this doesn't introduce any
regressions, because one of the following conditions will hold:
1. SVE is enabled at compile time whenever we use a streaming or
streaming compatible function. In this case the compiler won't need to
use __arm_get_current_vg, so it doesn't matter if it gives the wrong
answer.
2. There is a use of a streaming or streaming compatible function when
we don't know whether SVE is enabled. In order to get correct DWARF
unwind information, we then have to be able to test for SVE availability
at runtime. This isn't possible until a working __arm_get_current_vg
implementation is available, so the configuration has never (yet) been
supported.
vect: fix vectorization of non-gather elementwise loads [PR124037]
For the vectorization of non-contiguous memory accesses such as the
vectorization of loads from a particular struct member, specifically
when vectorizing with unknown bounds (thus using a pointer and not an
array) it is observed that inadequate alignment checking allows for
the crossing of a page boundary within a single vectorized loop
iteration. This leads to potential segmentation faults in the
resulting binaries.
without any proper address alignment checks on the starting address
or on whether alignment is preserved across iterations. We therefore
fix the handling of such cases.
To correct this, we modify the logic in `get_load_store_type',
particularly the logic responsible for ensuring we don't read more
than the scalar code would in the context of early breaks, extending
it from handling not only gather-scatter and strided SLP accesses but
also allowing it to properly handle element-wise accesses, wherein we
specify that these need correct block alignment, thus promoting their
`alignment_support_scheme' from `dr_unaligned_supported' to
`dr_aligned'.
gcc/ChangeLog:
PR tree-optimization/124037
* tree-vect-stmts.cc (get_load_store_type): Fix
alignment_support_scheme categorization for early
break VMAT_ELEMENTWISE accesses.
The following fixes a regression introduced by r11-5542 which
restricts replacing uses of live original defs of now vectorized
stmts to when that does not require new loop-closed PHIs to be
inserted. That restriction keeps the original scalar definition
live which is sub-optimal and also not reflected in costing.
The particular case the following fixes can be seen in
gcc.dg/vect/bb-slp-57.c is the case where we are replacing an
existing loop closed PHI argument.
PR tree-optimization/98064
* tree-vect-loop.cc (vectorizable_live_operation): Do
not restrict replacing uses in a LC PHI.
* gcc.dg/vect/bb-slp-57.c: Verify we do not keep original
stmts live.
Jakub Jelinek [Thu, 5 Mar 2026 12:11:39 +0000 (13:11 +0100)]
libiberty: Copy over .ARM.attributes section into *.debug.temp.o files [PR124365]
If gcc is configured on aarch64-linux against new binutils, such as
2.46, it doesn't emit into assembly markings like
.section .note.gnu.property,"a"
.align 3
.word 4
.word 16
.word 5
.string "GNU"
.word 0xc0000000
.word 4
.word 0x7
.align 3
but instead emits
.aeabi_subsection aeabi_feature_and_bits, optional, ULEB128
.aeabi_attribute Tag_Feature_BTI, 1
.aeabi_attribute Tag_Feature_PAC, 1
.aeabi_attribute Tag_Feature_GCS, 1
The former goes into .note.gnu.propery section, the latter goes into
.ARM.attributes section.
Now, when linking without LTO or with LTO but without -g, all behaves
for the linked binaries the same, say for test.c
int main () {}
$ gcc -g -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC, GCS
$ gcc -flto -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC, GCS
$ gcc -flto -g -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
readelf: Warning: Section '.note.gnu.property' was not dumped because it does not exist
The problem is that the *.debug.temp.o object files created by lto-wrapper
don't have these markings. The function copies over .note.GNU-stack section
(so that it doesn't similarly on most arches break PT_GNU_STACK segment
flags), and .note.gnu.property (which used to hold this stuff e.g. on
aarch64 or x86, added in PR93966). But it doesn't copy the new
.ARM.attributes section.
The following patch fixes it by copying that section too. The function
unfortunately only works on names, doesn't know if it is copying ELF or some
other format (PE, Mach-O) or if it is copying ELF, whether it is EM_AARCH64
or some other arch. The following patch just copies the section always,
I think it is very unlikely people would use .ARM.attributes section for
some random unrelated stuff. If we'd want to limit it to just EM_AARCH64,
guess it would need to be done in
libiberty/simple-object-elf.c (simple_object_elf_copy_lto_debug_sections)
instead as an exception for the (*pfn) callback results (and there it could
e.g. verify SHT_AARCH64_ATTRIBUTES type but even there dunno if it has
access to the Ehdr stuff).
No testcase from me, dunno if e.g. the linker can flag the lack of those
during linking with some option rather than using readelf after link and
what kind of effective targets we'd need for such a test.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124365
* simple-object.c (handle_lto_debug_sections): Also copy over
.ARM.attributes section.
Tomasz Kamiński [Thu, 5 Mar 2026 07:57:24 +0000 (08:57 +0100)]
libstdc++: Fix atomic/cons/zero_padding.cc test for arm-none-eabi [PR124124]
The test uses dg-require-atomic-cmpxchg-word that checks if atomic compare
exchange is available for pointer sized integers, and then test types that
are eight bytes in size. This causes issue for targets for which pointers
are four byte and libatomic is not present, like arm-none-eabi.
This patch addresses by using short member in TailPadding and MidPadding,
instead of int. This reduces the size of types to four bytes, while keeping
padding bytes present.
PR libstdc++/124124
libstdc++-v3/ChangeLog:
* testsuite/29_atomics/atomic/cons/zero_padding.cc: Limit size of
test types to four bytes.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Wed, 25 Feb 2026 11:15:08 +0000 (12:15 +0100)]
libstdc++: Remove UB in _Arg_value union alternative assignment
The _Arg_value::_M_set method, initialized the union member, by
assigning to reference to that member produced by _M_get(*this).
However, per language rules, such assignment has undefined behavior,
if alternative was not already active, same as for any object not
within its lifetime.
To address above, we modify _M_set to use placement new for the class
types, and invoke _S_access with two arguments for all other types.
The _S_access (rename of _S_get) is modified to assign the value of
the second parameter (if provided) to the union member. Such direct
assignments are treated specially in the language (see N5032
[class.union.general] p5), and will start lifetime of trivially default
constructible alternative.
libstdc++-v3/ChangeLog:
* include/std/format (_Arg_value::_M_get): Rename to...
(_Arg_value::_M_access): Modified to accept optional
second parameter that is assigned to value.
(_Arg_value::_M_get): Handle rename.
(_Arg_value::_M_set): Use construct_at for basic_string_view,
handle, and two-argument _S_access for other types.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Ivan Lazaric <ivan.lazaric1@gmail.com> Co-authored-by: Ivan Lazaric <ivan.lazaric1@gmail.com>
Jakub Jelinek [Thu, 5 Mar 2026 10:23:24 +0000 (11:23 +0100)]
i386: Make -masm={att,intel} xchg operand order consistent
While in this case it is not an assemble failure nor wrong-code,
because say xchgl %eax, %edx and xchg eax, edx do the same thing,
they are encoded differently, so if we want consistency between
-masm=att and -masm=intel emitted code (my understanding is that
is what is Zdenek testing right now, fuzzing code, compiling
with both -masm=att and -masm=intel and making sure if the former
assembles, the latter does too and they result in identical
*.o files), we should use different order of the operands
even here (and it doesn't matter which order we pick).
I've grepped the *.md files with
grep '\\t%[0-9], %[0-9]' *.md | grep -v '%0, %0'
i386.md: "xchg{<imodesuffix>}\t%1, %0"
i386.md: xchg{<imodesuffix>}\t%1, %0
i386.md: "wrss<mskmodesuffix>\t%0, %1"
i386.md: "wruss<mskmodesuffix>\t%0, %1"
(before this and PR124366 fix) and later on also with
grep '\\t%[a-z0-9_<>]*[0-9], %[a-z0-9_<>]*[0-9]' *.md | grep -v '%0, %0'
and checked all the output and haven't found anything else problematic.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.md (swap<mode>): Swap operand order for
-masm=intel.
Tomasz Kamiński [Tue, 24 Feb 2026 07:08:58 +0000 (08:08 +0100)]
libstdc++: Store basic_format_arg::handle in __format::_Arg_value
This patch changes the type of _M_handle member of __format::_Arg_value
from __format::_HandleBase union member to basic_format_arg<_Context>::handle.
This allows handle to be stored (using placement new) inside _Arg_value at
compile time, as type _M_handle member now matches stored object.
In addition to above, to make handle usable at compile time, we adjust
the _M_func signature to match the stored function, avoiding the need
for reinterpret cast.
To avoid a cycling dependency, where basic_format_arg<_Context> requires
instantiating _Arg_value<_Context> for its _M_val member, that in turn
requires basic_format_arg<_Context>::handle, we define handle as nested
class inside _Arg_value and change basic_format_arg<_Context>::handle
to alias for it.
Finally, the handle(_Tp&) constructor is now constrained to not accept
handle itself, as otherwise it would be used instead of copy-constructor
when constructing from handle&.
As _Arg_value is already templated on _Context, this change should not lead
to additional template instantiations.
libstdc++-v3/ChangeLog:
* include/std/format (__Arg_value::handle): Define, extracted
with modification from basic_format_arg::handle.
(_Arg_value::_Handle_base): Remove.
(_Arg_value::_M_handle): Change type to handle.
(_Arg_value::_M_get, _Arg_value::_M_set): Check for handle
type directly, and return result unmodified.
(basic_format_arg::__formattable): Remove.
(basic_format_arg::handle): Replace with alias to
_Arg_value::handle.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
PR 123807 turns out to be a special case of the middle-end PR 124250.
The previous ad-hoc fix is unneeded now since the underlying middle-end
issue is fixed, so revert it but keep the test case.
Jakub Jelinek [Thu, 5 Mar 2026 09:05:44 +0000 (10:05 +0100)]
i386: Fix up last -masm=intel operand of vcvthf82ph [PR124349]
gas expects for this instruction
vcvthf82ph xmm30, QWORD PTR [r9]
vcvthf82ph ymm30, XMMWORD PTR [r9]
vcvthf82ph zmm30, YMMWORD PTR [r9]
i.e. the memory size is half of the dest register size.
We currently emit it for the last 2 forms but emit XMMWORD PTR
for the first one too. So, we need %q1 for V8HF and for V16HF/V32HF
can either use just %1 or %x1/%t1. There is no define_mode_attr
that would provide those, so I've added one just for this insn.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124349
* config/i386/sse.md (iptrssebvec_2): New define_mode_attr.
(cvthf82ph<mode><mask_name>): Use it for -masm=intel input
operand.
Jakub Jelinek [Thu, 5 Mar 2026 08:35:39 +0000 (09:35 +0100)]
i386: Fix operand order for @wrss<mode> and @wruss<mode> [PR124366]
These two insns were using the same operand order for both -masm=att
and -masm=intel, which is ok if using the same operand for both, but not
when they are different.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124366
* config/i386/i386.md (@wrss<mode>, @wruss<mode>): Swap operand
order for -masm=intel.
Jakub Jelinek [Thu, 5 Mar 2026 08:19:59 +0000 (09:19 +0100)]
c++: Fix up handling of unnamed types named by typedef for linkage purposes for -freflection [PR123810]
As mentioned on the PR, we ICE on the following testcase and if members_of
isn't called on a class with e.g. typedef struct { int d; } D;, we don't
handle it correctly, e.g. we say ^^C::D is not an type alias or for
members_of in a namespace that there aren't two entities, the struct itself
and the type alias for it.
This is because name_unnamed_type handles the naming of an unnamed type
through typedef for linkage purposes (where we originally have
a TYPE_DECL with IDENTIFIER_ANON_P DECL_NAME for the type) by replacing
all occurrences of TYPE_NAME on the type from the old TYPE_DECL to the new
TYPE_DECL with the user provided name.
The ICE for members_of (^^C, uctx) is then because we see two TYPE_DECLs
(one with IDENTIFIER_ANON_P, one with user name) with the same TREE_TYPE
and enter the same thing twice into what we want to return and ICE in the
comparison routine. Anyway, for is_type_alias purposes, there is no
is_typedef_decl and can't be because the same TYPE_DECL is used as TYPE_NAME
of both the type proper and its alias. Without reflection we didn't care
about the difference.
So, the following patch changes name_unnamed_type to do things differently,
but only for -freflection, because 1) I don't want to break stuff late in
stage4 2) without reflection we don't really need it and don't need to
pay the extra memory cost by having another type which is the type alias.
The change is that instead of
TYPE_DECL .anon_NN
| TREE_TYPE
v
type <----------+
| TYPE_NAME |
v |
TYPE_DECL D |
| TREE_TYPE |
+-------------+
where for class context both TYPE_DECLs are in TYPE_FIELDS and for
namespace context only the latter one is (as pushdecl ignores the
IDENTIFIER_ANON_P one) we have
TYPE_DECL D TYPE_DECL D --- DECL_ORIGINAL_TYPE
| TREE_TYPE | TREE_TYPE |
v v |
type variant_type |
^-------------------------------+
which is except for the same DECL_NAME on both TYPE_DECLs exactly what
is used for typedef struct D_ { int d; } D;
Various spots have been testing for the typedef name for linkage purposes
cases and were using tests like:
OVERLOAD_TYPE_P (TREE_TYPE (value))
&& value == TYPE_NAME (TYPE_MAIN_VARIANT (TREE_TYPE (value)))
So that this can be tested, this patch introduces a new decl_flag on
the TYPE_DECLs and marks for -freflection both of these TYPE_DECLs
(and for -fno-reflection the one without IDENTIFIER_ANON_P name).
It is easy to differentiate between the two, the first one is also
DECL_IMPLICIT_TYPEDEF_P, the latter is not (and on the other side
has DECL_ORIGINAL_TYPE non-NULL).
For name lookup in namespaces, nothing special needs to be done,
because the originally IDENTIFIER_ANON_P TYPE_DECL wasn't added
to the bindings, at block scope I had to deal with it in pop_local_binding
because it was unhappy that it got renamed. And finally for class
scopes, we need to arrange for the latter TYPE_DECL to be found, but
currently it is the second one. The patch currently skips the first one for
name lookup in fields_linear_search and arranges for count_class_fields
and member_vec_append_class_fields to also ignore the first one. Wonder if
the latter two shouldn't also ignore any other IDENTIFIER_ANON_P TYPE_FIELDS
chain decls, or do we ever perform name lookup for the anon identifiers?
Another option for fields_linear_search would be try to swap the order of
the two TYPE_DECLs in TYPE_FIELDS chain somewhere in grokfield.
Anyway, the changes result in minor emitted DWARF changes, say for
g++.dg/debug/dwarf2/typedef1.C without -freflection there is
.uleb128 0x4 # (DIE (0x46) DW_TAG_enumeration_type)
.long .LASF6 # DW_AT_name: "typedef foo<1>::type type"
.byte 0x7 # DW_AT_encoding
.byte 0x4 # DW_AT_byte_size
.long 0x70 # DW_AT_type
.byte 0x1 # DW_AT_decl_file (typedef1.C)
.byte 0x18 # DW_AT_decl_line
.byte 0x12 # DW_AT_decl_column
.long .LASF7 # DW_AT_MIPS_linkage_name: "N3fooILj1EE4typeE"
...
and no typedef, while with -freflection there is
.uleb128 0x3 # (DIE (0x3a) DW_TAG_enumeration_type)
.long .LASF5 # DW_AT_name: "type"
.byte 0x7 # DW_AT_encoding
.byte 0x4 # DW_AT_byte_size
.long 0x6c # DW_AT_type
.byte 0x1 # DW_AT_decl_file (typedef1.C)
.byte 0x18 # DW_AT_decl_line
.byte 0x12 # DW_AT_decl_column
...
.uleb128 0x5 # (DIE (0x57) DW_TAG_typedef)
.long .LASF5 # DW_AT_name: "type"
.byte 0x1 # DW_AT_decl_file (typedef1.C)
.byte 0x18 # DW_AT_decl_line
.byte 0x1d # DW_AT_decl_column
.long 0x3a # DW_AT_type
so, different DW_AT_name on the DW_TAG_enumeration_type, missing
DW_AT_MIPS_linkage_name and an extra DW_TAG_typedef. While in theory
I could work harder to hide that detail, I actually think it is a good
thing to have it the latter way because it represents more exactly
what is going on.
Another slight change is different locations in some diagnostics
on g++.dg/lto/odr-3 test (location of the unnamed struct vs. locations
of the typedef name given to it without -freflection), and a module
issue which Nathan has some WIP patch for in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123810#c11
In any case, none of those differences show up in normal testsuite runs
currently (as those tests aren't compiled with -freflection), if/when
-freflection becomes the default for -std=c++26 we can deal with the
DWARF one as well as different locations in odr-3 and for modules I was
hoping it could be handled incrementally. I'm not even sure what should
happen if one TU has struct D { int d; }; and another one has
typedef struct { int d; } D;, shall that be some kind of error? Though
right now typedef struct { int d; } D; in both results in an error too
and that definitely needs to be handled.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/123810
* cp-tree.h (TYPE_DECL_FOR_LINKAGE_PURPOSES_P): Define.
(TYPE_DECL_WAS_UNNAMED): Likewise.
(TYPE_WAS_UNNAMED): Also check TYPE_DECL_WAS_UNNAMED.
* decl.cc (start_decl): Use TYPE_DECL_FOR_LINKAGE_PURPOSES_P.
(maybe_diagnose_non_c_class_typedef_for_l): If t == type, use
DECL_SOURCE_LOCATION (orig) instead of
DECL_SOURCE_LOCATION (TYPE_NAME (t)).
(name_unnamed_type): Set TYPE_DECL_FOR_LINKAGE_PURPOSES_P
on decl. For -freflection don't change TYPE_NAME from
orig to decl, but instead change DECL_NAME (orig) to
DECL_NAME (decl) and set TYPE_DECL_FOR_LINKAGE_PURPOSES_P on
orig too.
* decl2.cc (grokfield): Use TYPE_DECL_FOR_LINKAGE_PURPOSES_P.
* name-lookup.cc (fields_linear_search): Ignore
TYPE_DECL_WAS_UNNAMED decls.
(count_class_fields): Likewise.
(member_vec_append_class_fields): Likewise.
(pop_local_binding): Likewise.
* reflect.cc (namespace_members_of): For TYPE_DECL with
TYPE_DECL_FOR_LINKAGE_PURPOSES_P set also append
reflection of strip_typedefs (m).
* class.cc (find_flexarrays): Handle TYPE_DECLs with
TYPE_DECL_WAS_UNNAMED like the ones with IDENTIFIER_ANON_P
name.
* g++.dg/reflect/members_of10.C: New test.
* g++.dg/cpp2a/typedef1.C: Expect one message on a different line.
Marek Polacek [Wed, 4 Mar 2026 22:32:14 +0000 (17:32 -0500)]
c++/reflection: fix return value of meta::extent [PR124368]
std::meta::extent returns a size_t, but eval_extent returns either
size_zero_node or size_binop(), both of which are of type sizetype,
which is not the C/C++ size_t and so we don't pass the check in
cxx_eval_outermost_constant_expr:
/* Check we are not trying to return the wrong type. */
if (!same_type_ignoring_top_level_qualifiers_p (type, TREE_TYPE (r)))
We should convert to size_type_node which represents the C/C++ size_t,
like for instance fold_sizeof_expr does.
PR c++/124368
gcc/cp/ChangeLog:
* reflect.cc (eval_extent): Convert the result to size_type_node.
So just need to update the testcase removing the xfail and close this
bug as fixed.
The reason why this was not fixed until r16-101-g132d01d96ea9d6 is
because the call is from main which is known to be called once and was
not a candidate for IPA-CP until then.
In fact renaming the function from main to f (and adding a `return 0`
so not invoking undefined behavior), the scan-ipa-dump works all the
way back to GCC 5.
Tested on aarch64-linux-gnu and arm-linux-gnueabihf.
Marek Polacek [Mon, 2 Mar 2026 22:12:56 +0000 (17:12 -0500)]
c++: reusing typedefs in template for [PR124229]
This is a crash on code like:
template for (constexpr auto val : define_static_array (enumerators_of (^^E)))
{
constexpr auto a = annotations_of(val)[0];
using U = [:type_of(a):];
constexpr auto m1 = extract<U>(a);
}
because the template arg to extract wasn't substituted to "info".
Once I dug deeper I realized this problem isn't tied to Reflection:
we also crash here:
template for (constexpr auto val : { 42 })
{
using U = decltype(val);
foo<U>();
}
because we emit code for foo() that still has a DECLTYPE_TYPE in it.
The problem is in tsubst and reusing typedefs. Normally, for code like
template<typename T> void foo () {
using U = T;
U u;
}
we do the DECL_FUNCTION_SCOPE_P -> retrieve_local_specialization call.
This call only happens in function templates (that are not explicit
specializations), but the "template for" above are both in non-template
functions. So we end up returning the original tree:
/* The typedef is from a non-template context. */
return t;
It seems clear that this is the wrong thing to do, and that the
DECL_FUNCTION_SCOPE_P code should happen in this scenario as well.
[temp.decls.general] tells me that "For the purpose of name lookup and
instantiation, the compound-statement of an expansion-statement is
considered a template definition." so I'm guessing that we want to
check for an expansion-statement as well. As decl_dependent_p says,
in_expansion_stmt is false when instantiating, so I'm looking for
sk_template_for.
PR c++/124229
gcc/cp/ChangeLog:
* pt.cc (in_expansion_stmt_p): New.
(tsubst): When reusing typedefs, do retrieve_local_specialization also
when in_expansion_stmt_p is true.
gcc/testsuite/ChangeLog:
* g++.dg/cpp26/expansion-stmt32.C: New test.
* g++.dg/reflect/expansion-stmt2.C: New test.
Jakub Jelinek [Wed, 4 Mar 2026 18:22:29 +0000 (19:22 +0100)]
c++: Find annotations in DECL_ATTRIBUTES (TYPE_NAME (r)) for type aliases
On Wed, Feb 25, 2026 at 08:50:40PM +0100, Jakub Jelinek wrote:
> > Sounds like the maybe_strip_typedefs is wrong, since reflection in general
> > tries to preserve aliases.
>
> Actually the maybe_strip_typedefs call is correct, that is for the type
> argument (so when it is std::meta::annotations_with_type) and the standard
> says that dealias should be used
> - https://eel.is/c++draft/meta.reflection#annotation-6.2
> But we probably shouldn't use TYPE_ATTRIBUTES but DECL_ATTRIBUTES (TYPE_NAME (r))
> if r is a type alias.
> I'll test a patch for that separately.
Here it is.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR c++/123866
* reflect.cc (eval_annotations_of): For type aliases look for
annotations in DECL_ATTRIBUTES (TYPE_NAME (r)).
Jakub Jelinek [Wed, 4 Mar 2026 16:12:29 +0000 (17:12 +0100)]
libgfortran: Fix up putenv uses in libcaf_shmem [PR124330]
I don't have access to HP/UX, but at least on other OSes and what Linux as
well as POSIX documents is that when you call putenv with some argument,
what that argument points to becomes part of the environment and when
it is changed, the environment changes. I believe ENOMEM from putenv is
about reallocating of the __environ (or similar) pointed array of pointers
(e.g. if the particular env var name isn't there already), it still
shouldn't allocate any memory for the NAME=VALUE string and just use
the user provided. So, padding address of automatic array will be UB
as soon as the scope of that var is left.
One can either malloc the buffer, or use static vars, then nothing leaks
and in the unlikely case putenv would be called twice for the same env var,
it would second time only register the same buffer.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR libfortran/124330
* caf/shmem/shared_memory.c (shared_memory_set_env): Make buffer
used by putenv static.
(shared_memory_init): Likewise.
Andrew Pinski [Tue, 3 Mar 2026 21:57:47 +0000 (13:57 -0800)]
widen mult: Fix handling of _Fract mixed with _Fract [PR119568]
The problem here is we try calling find_widening_optab_handler_and_mode
with to_mode=E_USAmode and from_mode=E_UHQmode. This causes an ICE (with checking only).
The fix is to reject the case where the mode classes are different in convert_plusminus_to_widen
before even trying to deal with the modes.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119568
gcc/ChangeLog:
* tree-ssa-math-opts.cc (convert_plusminus_to_widen): Reject different
mode classes.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Implements P2353R5 "Extending associative containers with the
remaining heterogeneous overloads". Adds overloads templated on
heterogeneous key types for several members of associative
containers, particularly insertions:
(Nothing is added to the multiset or multimap tree containers.)
All the insert*() and try_emplace() members also get a hinted
overload. The at() members get const and non-const overloads.
The new overloads enforce concept __heterogeneous_tree_key or
__heterogeneous_hash_key, as in P2077, to enforce that the
function objects provided meet requirements, and that the key
supplied is not an iterator or the native key. Insertions
implicitly construct the required key_type object from the
argument, by move where permitted.
Philipp Tomsich [Wed, 4 Mar 2026 08:49:09 +0000 (09:49 +0100)]
avoid-store-forwarding: Clear sbitmap before use [PR124351]
The forwarded_bytes sbitmap needs to be zeroed after allocation,
as sbitmaps are not implicitly initialized. This caused valgrind
warnings about conditional jumps depending on uninitialised values.
gcc/ChangeLog:
PR rtl-optimization/124351
* avoid-store-forwarding.cc (process_store_forwarding): Add
bitmap_clear after allocating forwarded_bytes.
Jakub Jelinek [Wed, 4 Mar 2026 08:38:28 +0000 (09:38 +0100)]
i386: Fix up vcvt<convertfp8_pack><mode><mask_name> for -masm=intel [PR124341]
The vcvt<convertfp8_pack><mode><mask_name> pattern uses wrong <mask_operand?>
for -masm=intel, so the testcase fails to assemble, it emits something
like {ymm1} instead of {k1}.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/124341
* config/i386/sse.md (vcvt<convertfp8_pack><mode><mask_name>): Use
<mask_operand3> rather than <mask_operand2> for -masm=intel.
Jakub Jelinek [Wed, 4 Mar 2026 08:34:33 +0000 (09:34 +0100)]
i386: Fix up printing of input operand of avx10_2_comisbf16_v8bf for -masm=intel [PR124349]
gas expects the second operand if in memory WORD PTR rather than XMMWORD PTR.
The following patch fixes it by using %w1 instead of %1, if the operand is
a register, it is printed as xmm1 in both cases.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/124349
* config/i386/sse.md (avx10_2_comisbf16_v8bf): Use %w1 instead of %1
for -masm=intel.
Richard Biener [Wed, 4 Mar 2026 08:25:27 +0000 (09:25 +0100)]
Adjust gcc.dg/vect/vect-reduc-dot-s8b.c again
A failure on sparc shows that the dump scan for dot-prod is fragile
enough. The following simply removes it given it serves no actual
purpose and adds comments in place.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Remove scan for
dot_prod pattern matching.
liuhongt [Wed, 4 Mar 2026 02:49:37 +0000 (18:49 -0800)]
Refine the testcase.
> This testcase fails with binutils 2.35:
vmovw is supported in binutils 2.38 and later, need
/* { dg-require-effective-target avx512fp16 } */ to avoid errors.
> ```
> /tmp/ccf20y5C.s:20: Error: no such instruction: `vmovw xmm0,WORD PTR .LC0[rip]'
> /tmp/ccf20y5C.s:21: Error: no such instruction: `vmovw WORD PTR [rbp-18],xmm0'
> /tmp/ccf20y5C.s:22: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:23: Error: no such instruction: `vmovw WORD PTR [rbp-20],xmm0'
> /tmp/ccf20y5C.s:24: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:25: Error: no such instruction: `vmovw WORD PTR [rbp-22],xmm0'
> /tmp/ccf20y5C.s:26: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:27: Error: no such instruction: `vmovw WORD PTR [rbp-24],xmm0'
> /tmp/ccf20y5C.s:28: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:29: Error: no such instruction: `vmovw WORD PTR [rbp-26],xmm0'
> /tmp/ccf20y5C.s:30: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> ```
>
> Thanks,
> Andrew Pinski
gcc/testsuite/ChangeLog:
PR target/124335
* gcc.target/i386/avx512fp16-pr124335.c: Require target
avx512fp16 instead of avx512bw.
H.J. Lu [Sun, 22 Feb 2026 02:32:30 +0000 (10:32 +0800)]
x86: Call ix86_access_stack_p only with symbolic constant load
ix86_access_stack_p can be quite expensive. Cache the result and call it
only if there are symbolic constant loads. This reduces the compile time
of PR target/124165 test from 202 seconds to 55 seconds.
gcc/
PR target/124165
* config/i386/i386-protos.h (symbolic_reference_mentioned_p):
Change the argument type from rtx to const_rtx.
* config/i386/i386.cc (symbolic_reference_mentioned_p): Likewise.
(ix86_access_stack_p): Add 2 auto_bitmap[] arguments. Cache
the register BB domination result.
(ix86_symbolic_const_load_p_1): New.
(ix86_symbolic_const_load_p): Likewise.
(ix86_find_max_used_stack_alignment): If there is no symbolic
constant load into the register, don't call ix86_access_stack_p.
[PR115042, LRA]: Postpone processing of new reload insns, 2nd variant
This is the second attempt to solve the PR. The first attempt (see
commit 9a7da540b63e7d77e747b5cdd6fdbbd3954e28c8) resulted in numerous
test suite failures on some secondary targets.
LRA in this PR can not find regs for asm insn which requires 11
general regs when 13 regs are available. Arm subtarget (thumb) has
two stores with low and high general regs. LRA systematically chooses
stores involving low regs as having less costs and there are only 8
low regs. That is because LRA (and reload) chooses (mov) insn
alternatives independently from register pressure.
The proposed patch postpones processing new reload insns until the
reload pseudos are assigned and after that considers new reload insns.
We postpone reloads only for asm insns as they can have a lot of
operands. Depending on the assignment LRA chooses insns involving low
or high regs. Generally speaking it can change code generation in
better or worse way but it should be a very rare case.
The patch does not contain the test as original test is too big (300KB
of C code). Unfortunately cvise after 2 days of work managed to
decrease the test only to 100KB file.
gcc/ChangeLog:
PR target/115042
* lra-int.h (lra_postponed_insns): New.
* lra.cc (lra_set_insn_deleted, lra_asm_insn_error): Clear
postponed insn flag.
(lra_process_new_insns): Propagate postponed insn flag for asm
gotos.
(lra_postponed_insns): New.
(lra): Initialize lra_postponed_insns. Push postponed insns on
the stack.
* lra-constraints.cc (postpone_insns): New function.
(curr_insn_transform): Use it to postpone processing reload insn
constraints. Skip processing postponed insns.
Mark Wielaard [Tue, 3 Mar 2026 19:34:58 +0000 (20:34 +0100)]
libgfortran: Regenerate config.h.in and configure
commit e13b14030a30 ("Fortran: Fix libfortran cannot be cross compiled
[PR124286]") updated configure.ac but didn't regenerate config.h.in
with autoheader. Also some line numbers were still wrong in
configure. Fix this by explicitly regenerating both files with
autoheader and autoconf version 2.69.
PR libstdc++/122217
* testsuite/27_io/filesystem/operations/copy_symlink/1.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/2.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/3.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/4.cc: New
test.
Jerry DeLisle [Tue, 3 Mar 2026 04:02:58 +0000 (20:02 -0800)]
Fortran: Fix failures on windows and hpux systems [PR124330]
Co-authored-by: John David Anglin <danglin@gcc.gnu.org>
PR fortran/124330
libgfortran/ChangeLog:
* caf/shmem/shared_memory.c: Fix filenames for WIN32
includes.
(shared_memory_set_env): Use putenv() for HPUX and as
a fallback where setenv () is not available.
(NAME_MAX): Replace with SHM_NAME_MAX.
(SHM_NAME_MAX): Use this to avoid duplicating NAME_MAX
used elsewhere.
* caf/shmem/supervisor.c (get_image_num_from_envvar): Add
a fallback for HPUX. Add additional comment to explain why
the number of cores is used in lieu of GFORTRAN_NUM_IMAGES.
Martin Uecker [Fri, 20 Feb 2026 16:19:10 +0000 (17:19 +0100)]
c: Fix wrong code related to TBAA for components of structure types 2/2 [PR122572]
Given the following two types, the C FE assigns the same
TYPE_CANONICAL to both struct bar, because it treats pointer to
tagged types with the same type as compatible (in this context).
struct foo { int y; };
struct bar { struct foo *c; }
struct foo { long y; };
struct bar { struct foo *c; }
get_alias_set records the components of aggregate types, but only
considers the components of the canonical version. To prevent
miscompilation, we create a modified canonical type where we
change such pointers to void pointers.
PR c/122572
gcc/c/ChangeLog:
* c-decl.cc (finish_struct): Add distinct canonical type.
* c-tree.h (c_type_canonical): Prototype for new function.
* c-typeck.cc (c_type_canonical): New function.
(ptr_to_tagged_member): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123356-2.c: New test.
* gcc.dg/struct-alias-2.c: New test.
Martin Uecker [Tue, 6 Jan 2026 18:26:42 +0000 (19:26 +0100)]
c: Fix wrong code related to TBAA for components of structure types 1/2 [PR122572]
When computing TYPE_CANONICAL we form equivalence classes of types
ignoring some aspects. In particular, we treat two structure / union
types as equivalent if a member is a pointer to another tagged type
which has the same tag, even if this pointed-to type is otherwise not
compatible. The fundamental reason why we do this is that even in a
single TU the equivalence class needs to be consistent with compatibility
of incomplete types across TUs. (LTO globs such pointers to void*).
The bug is that the test incorrectly treated also two pointed-to types
without tag as equivalent. One would expect that this just pessimizes
aliasing decisions, but due to how the middle-end handles TBAA for
components of structures, this leads to wrong code.
Jakub Jelinek [Tue, 3 Mar 2026 14:47:08 +0000 (15:47 +0100)]
i386: Use orb instead of orl/orq for stack probes/clash [PR124336]
This PR is about an inconsistency between AT&T and Intel syntax
for output_adjust_stack_and_probe/output_probe_stack_range.
On ia32 they use both orl or or BYTE PTR, i.e. 32-bit or,
but on x86_64 in AT&T syntax they use orq (i.e. 64-bit or) and
in Intel syntax they use or DWORD PTR (i.e. 32-bit or).
These cases are used when probing stack in a loop, for each
page one probe. There is also the probe_stack named pattern
which currently uses word_mode or (i.e. 64-bit or for x86_64)
for both syntaxes, used when probing only once.
Functionally, I think whether we do an 8-bit or 32-bit or 64-bit
or with 0 constant doesn't matter, we don't modify any values on the
stack, just pretend to modify it. The 8-bit and 32-bit ors
are 1-byte shorter though than 64-bit one. How the 3 behave
performance-wise is unknown, if the particular probed spot on the
stack hasn't been stored/read for a while and won't be for a while,
then I'd think it shouldn't matter, dunno if there can be store
forwarding effects if it has been e.g. written or read very recently
by some other function as say 32-bit access and now is 8-bit. The
access after the probe (if it happens soon enough) should be in valid
programs a store (and again, dunno if there can be issues if the
sizes are different).
Now, for consistency reasons, we could just make the Intel
syntax match the AT&T and use 64-bit or on x86_64, so
use QWORD PTR instead of DWORD PTR if stack_pointer_rtx is 64-bit
in those 2 functions and be done with it.
Another possibility is use always 32-bit ors (in both those 2 functions
and probe_stack*; similar to the posted patch except testsuite changes
aren't needed and s/{b}/{l}/g;s/QI/SI/g;s/BYTE PTR/DWORD PTR/g) and
last option is to always use 8-bit ors (which is what the following
patch does). Or some other mix, say use 32-bit ors for -Os/-Oz and
64-bit ors otherwise.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/124336
* config/i386/i386.cc (output_adjust_stack_and_probe): Use
or{b} rather than or%z0 and BYTE PTR rather than DWORD PTR.
(output_probe_stack_range): Likewise.
* config/i386/i386.md (probe_stack): Pass just 2 arguments
to gen_probe_stack_1, first adjust_address to QImode, second
const0_rtx.
(@probe_stack_1_<mode>): Remove.
(probe_stack_1): New define_insn.
* gcc.target/i386/stack-check-11.c: Allow orb next to orl/orq.
* gcc.target/i386/stack-check-18.c: Likewise.
* gcc.target/i386/stack-check-19.c: Likewise.
Jakub Jelinek [Tue, 3 Mar 2026 14:44:19 +0000 (15:44 +0100)]
c++: Set OLD_PARM_DECL_P even in regenerate_decl_from_template [PR124306]
The following testcase ICEs, because we try to instantiate the PARM_DECLs
of foo <int> twice, once when parsing ^^foo <int> and remember in a
REFLECT_EXPR a PARM_DECL in there, later on regenerate_decl_from_template
is called and creates new set of PARM_DECLs and changes DECL_ARGUMENTS
(or something later on in that chain) to the new set.
This means when we call parameters_of on ^^foo <int> later on, they won't
compare equal to the earlier acquired ones, and when we do e.g. type_of
or other operation on the old PARM_DECL where it needs to search the
DECL_ARGUMENTS (DECL_CONTEXT (parm_decl)) list, it will ICE because it
won't find it there.
The following patch fixes it similarly to how duplicate_decls deals
with those, by setting OLD_PARM_DECL_P flag on the old PARM_DECLs, so that
before using reflections of those we search DECL_ARGUMENTS and find the
corresponding new PARM_DECL.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR c++/124306
* pt.cc (regenerate_decl_from_template): Mark the old PARM_DECLs
replaced with tsubst_decl result with OLD_PARM_DECL_P flag.