Harald Anlauf [Thu, 28 May 2026 20:49:26 +0000 (22:49 +0200)]
Fortran: checking of passed character length [PR125393]
Commit r16-3462 enhanced checking of character length passed to a character
dummy. However, when the actual argument was an array element, its storage
size was estimated from all elements up to the end of the array. This
could give a bogus warning when the dummy argument was of a scalar
character type. Fix check for this case to actually compare the character
lengths of actual and dummy.
PR fortran/125393
gcc/fortran/ChangeLog:
* interface.cc (get_expr_storage_size): Additionally return
character length.
(gfc_compare_actual_formal): When the formal is a scalar character
variable, use character lengths, not array storage size for check.
Nathan Myers [Tue, 26 May 2026 14:41:27 +0000 (10:41 -0400)]
libstdc++: allocate_at_least ask only what it reports (P0401)
allocate_at_least is rounding up the allocation request size to
its default alignment, which may be more than an integral
multiple of the object size requested. When the memory is freed,
what the container reports it is freeing differs from the amount
that was allocated. This patch rounds the request size back down
to what will be reported to the caller.
The algorithm to compute the allocation is altered in response
to findings on godbolt.org, which indicate dropping to uint8 to
perform the division is a pessimization everywhere other than
x86. The new version emits code for multiplication, instead.
In addition, the remaining -m32 test that failed under the new
allocation method is fixed, and guards are added for building
with -fno-aligned-new and for -fno-sized-deallocation.
Tested on x86 -m64/-m32.
libstdc++-v3/ChangeLog:
* include/bits/new_allocator.h (allocate_at_least): Reduce
allocation to match what is reported.
* testsuite/20_util/allocator/allocate_at_least2.cc: Add tests.
* testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc:
Fix, for -m32 and new allocation results.
Robert Dubner [Fri, 29 May 2026 13:43:07 +0000 (09:43 -0400)]
cobol: Speed improvements; function prototypes; POSIX compatibility.
1) The execution speed of ADD N TO VAR and SUBTRACT N FROM VAR where N
is an integer in the range -9 through +9 and VAR is of type Numeric
Display is improved through specialized code in genmath.cc
2) The execution speed of FILE READ of line-sequential files is improved
by using a 64K read buffer.
3) COBOL function prototypes are implemented.
4) These changes include the beginning of implementing the POSIX
compatibility layer.
5) Added the ability to detect GOTO_EXPR that lack matching LABEL_EXPR.
Co-authored-by: Robert Dubner <rdubner@symas.com> Co-authored-by: James K. Lowden <jklowden@cobolworx.com> Co-authored-by: Xavier Del Campo <xdelcampo@symas.com>
gcc/cobol/ChangeLog:
* Make-lang.in: Include gcobc script.
* cdf.y: Change formal parameters of cdf_literalize().
* cobol1.cc (cobol_langhook_handle_option): Add OPT_ftrunc option.
* compare.cc (total_digits_tree): Remove debugging statements.
(float_compare): Likewise.
* copybook.h (class copybook_elem_t): Update conditional close().
* dts.h: Change copyright notice.
* gcobc: Likewise.
* gcobol.1: Likewise.
* gcobol.3: Likewise.
* gcobolspec.cc (COMPAT_LIBRARY): POSIX compatibility.
(POSIX_LIBRARY): Likewise.
(lang_specific_driver): Likewise.
* genapi.cc (section_label): Missing LABEL_EXPR detection.
(paragraph_label): Likewise.
(internal_perform_through): Likewise.
(enter_program_common): Add comment.
(parser_enter_program): Change current_program_index() handling.
(build_alter_switch): Missing LABEL_EXPR detection.
(parser_display_internal): Handle REFER_T_ADDRESS_OF flag.
(create_and_call): ADDRESS OF is passed BY VALUE.
* gengen.cc (LOOK_FOR_MISSING_LABELS_not): Missing LABEL_EXPR
detection.
(dump_missing_labels): Likewise.
(gg_append_statement): Likewise.
(gg_struct_field_ref): Likewise.
(LABEL_ROOT): Likewise.
(gg_create_goto_pair): Likewise.
(scm_dump_generic_nodes): Forward declaration.
(gg_leaving_the_source_code_file): Missing LABEL_EXPR detection.
(label_decl_text_from_expr): New function.
* gengen.h (gg_create_assembler_name): New declaration.
(label_decl_text_from_expr): New declaration.
* genmath.cc (uchar_f_node): Fast ADD N TO NUMERIC-DISPLAY.
(uchar_ten_node): Likewise.
(fast_add): Likewise.
(fast_subtract): Likewise.
(parser_add): Likewise.
(add_floats): Likewise.
(ordinary_add_format_1): Likewise.
(ordinary_subtract_format_1): Likewise.
(add_case_1): Likewise.
(add_case_2): Likewise.
(add_case_3): Likewise.
(parser_multiply): Likewise.
(add_case_4): Likewise.
(add_litN_to_numdisp): Likewise.
(add_format_1): Likewise.
(add_format_2): Likewise.
(add_format_3): Likewise.
(subtract_floats): Likewise.
(subtract_format_1): Likewise.
(subtract_format_2): Likewise.
(subtract_format_3): Likewise.
(parser_subtract): Likewise.
* genutil.cc (refer_has_depends): False when type == FldIndex.
* lang-specs.h: Add fdefaultbyte, fstatic-call, ftrunc.
* lang.opt: Add ftrunc.
* lexio.cc (cdftext::open_input): Improved error message.
* parse.y: CDF support, POSIX support.
* parse_ante.h (cbl_division_t): Different enum.
(mode_syntax_only): New implementation of syntax_only.
(parse_error_inc): Likewise.
(resume_parsing): Likewise.
(successful_parse): Likewise.
(name_of): Formal parameter is now const.
(nice_name_of): Likewise.
(ast_op): Chanage formal parameters.
(prototype_ok): COBOL function prototypes.
(struct prototype_type_t): Likewise.
(is_allowed_name): Likewise.
(prototype_add): Likewise.
(prototype_args): Likewise.
(verify_args): Likewise.
(valid_pointer_relop): New function.
(field_value_all): Eliminate.
(current_field): COBOL function prototypes.
(ast_enter_exit_section): Improved error messages.
(data_division_ready): Improved mode_syntax_only.
(file_section_fd_set): Change "return false" to "return 0".
(ast_end_program): Improved mode_syntax_only.
* scan_ante.h (symbol_function_token): Use symbol_function_any().
(symbol_exists): Change for() loop termination.
(typed_name): COBOL function prototypes.
* structs.cc: Support for buffered FILE READ.
* symbols.cc (symbol_field_location): Use field_locs[] map.
(symbol_table_extend): Likewise.
(is_prototypical): COBOL function prototypes.
(symbol_elem_cmp): Likewise.
(symbol_program): Likewise.
(struct symbol_elem_t): Likewise.
(symbol_function): Likewise.
(enum protoreq_t): Likewise.
(symbol_function_impl): Likewise.
(struct cbl_label_t): Likewise.
(symbol_function_any): Likewise.
(symbols_dump): Likewise.
(cbl_field_t::attr_str): Likewise.
(field_str): Likewise.
(symbols_update): Likewise.
(symbol_field_add): Likewise.
(symbol_field_same_as): Likewise.
(cbl_alphabet_t::reencode): Detect iconv() errors.
(symbol_program_add): COBOL function prototypes.
* symbols.h (enum dspc_t): Enum for Division, Section, Paragraph,
Clause.
(cbl_prototype_ok): COBOL function prototypes.
(valid_move): Handle strong typing.
(struct parameter_t): Improved function parameter handling.
(struct cbl_ffi_arg_t): Likewise.
(struct cbl_label_t): COBOL function prototypes.
(struct function_descr_t): Likewise.
(struct cbl_alphabet_t): Detect iconv() errors.
(struct cbl_file_t): Support for LINAGE and the like.
(prototype_args):COBOL function prototypes.
(is_prototypical):COBOL function prototypes.
(is_numeric): Refmods are not numeric.
(struct symbol_elem_t): Additional declarations.
* symfind.cc (update_symbol_map2): Use symbols map.
* token_names.h: New comment.
* util.cc (cbl_prototype_ok): COBOL function prototypes.
(cdf_literalize): New formal parameters.
(effective_type): New function.
(valid_move): Handle strong typing.
(cobol_trunc_binary): Handle new ftrunc option.
(parse_error_reset): Forward declaration.
(parse_file): Formatting.
* util.h (cobol_trunc_binary): New declaration.
libgcobol/ChangeLog:
* Makefile.am: Add AM_COBC and AM_COBFLAGS; update
toolexeclib_LTLIBRARIES with libgcobol_posix.la and
libgcobol_compat_gnu.la.
* Makefile.in: POSIX compatibility support.
* aclocal.m4: Regenerate.
* charmaps.cc (__gg__iconverter): Restore map of encoding pairs.
(__gg__get_charmap): Change how encodings are mapped.
* charmaps.h (CHARMAPS_H): Include #include <map>.
(DEFAULT_32_ENCODING): Wrap in __FreeBSD__ conditional.
(error_msg_direct): Wrap in IN_TARGET_LIBS.
(class cbl_iconv_t): Wrapper for iconv() calls.
(class charmap_t): Explicit constructor.
* compat/README.md: POSIX compatibility layer.
* compat/gnu/lib/CBL_ALLOC_MEM.cbl: Likewise.
* compat/gnu/lib/CBL_CHECK_FILE_EXIST.cbl: Likewise.
* compat/gnu/lib/CBL_DELETE_FILE.cbl: Likewise.
* compat/gnu/lib/CBL_FREE_MEM.cbl: Likewise.
* compat/gnu/udf/stored-char-length.cbl: Likewise.
* compat/t/Makefile: Likewise.
* compat/t/smoke.cbl: Likewise.
* configure: Regenerate.
* configure.ac: New macros
* configure.tgt: Likewise.
* ec.h (enum ec_type_t): New implementor-defined ec_imp_iconv_open_e
exception.
* encodings.h (_ENCODINGS_H_): #include <type_traits> for mapping
the cbl_encoding_t values.
(struct cbl_encoding_t_hash): Likewise.
* exceptl.h (ec_type_of): Remove "extern" from declaration.
* gcobolio.h (FILE_BUFFER_SIZE): READ FILE buffer size.
* gfileio.cc (sequential_file_write): Honor non-ascii encodings.
(line_sequential_file_read): Buffered FILE READ.
(line_sequential_file_read_sbc): Buffered FILE READ.
* intrinsic.cc (string_to_dest): Eliminate function.
(get_all_time): Replace __gg__convert_encoding() with
__gg__iconverter().
(__gg__when_compiled): Likewise.
* io.cc (__compat_file_status_word): POSIX compatibility layer.
* io.h (enum file_high_t): Likewise.
(enum file_status_t): Likewise.
* libgcobol.cc (init_var_both): Eliminate call to
initialize_program_state().
(__gg__move): Eliminate call to __gg__convert_encoding_length;
handle REFER_T_ADDRESS_OF.
(display_both): Handle REFER_T_ADDRESS_OF.
(__gg__display_clean): Likewise.
(__gg__convert_encoding): Eliminate function.
(__gg__convert_encoding_length): Likewise.
(default_exception_handler): Improve exception handling.
(ec_type_descr): Likewise.
(ec_type_disposition): Likewise.
(ec_is_fatal): Likewise.
(__gg__check_fatal_exception): Likewise.
(__gg__set_env_value): Remove call to __gg__convert_encoding.
* libgcobol.h (__gg__convert_encoding): Eliminate.
(__gg__convert_encoding_length): Eliminate.
* posix/bin/udf-gen: POSIX compatibility.
* posix/cpy/posix-errno.cbl: Likewise.
* posix/cpy/psx-lseek.cpy: Likewise.
* posix/cpy/psx-open.cpy: Likewise.
* posix/cpy/statbuf.cpy: Likewise.
* posix/cpy/tm.cpy: Likewise.
* posix/shim/lseek.cc (offsetof): Likewise.
(posix_lseek): Likewise.
* posix/shim/open.cc (posix_open): Likewise.
* posix/t/errno.cbl: Likewise.
* posix/t/exit.cbl: Likewise.
* posix/t/localtime.cbl: Likewise.
* posix/t/stat.cbl: Likewise.
* posix/udf/posix-exit.cbl: Likewise.
* posix/udf/posix-ftruncate.cbl: Likewise.
* posix/udf/posix-localtime.cbl: Likewise.
* posix/udf/posix-lseek.cbl: Likewise.
* posix/udf/posix-mkdir.cbl: Likewise.
* posix/udf/posix-open.cbl: Likewise.
* posix/udf/posix-read.cbl: Likewise.
* posix/udf/posix-stat.cbl: Likewise.
* posix/udf/posix-unlink.cbl: Likewise.
* posix/udf/posix-write.cbl: Likewise.
* valconv.cc: New exceptions.
* compat/gnu/cpy/cblproto.cpy: New file.
* compat/gnu/cpy/cbltypes.cpy: New file.
* compat/gnu/cpy/stored-char-length.cpy: New file.
* compat/gnu/lib/CBL_CLOSE_FILE.cbl: New file.
* compat/gnu/lib/CBL_CREATE_FILE.cbl: New file.
* compat/gnu/lib/CBL_OPEN_FILE.cbl: New file.
* compat/gnu/lib/CBL_READ_FILE.cbl: New file.
* compat/gnu/lib/CBL_WRITE_FILE.cbl: New file.
* compat/gnu/lib/cbl_alloc_mem.3: New file.
* compat/gnu/lib/cbl_alloc_mem.cbl3: New file.
* compat/gnu/lib/cbl_check_file_exist.3: New file.
* compat/gnu/lib/cbl_close_file.3: New file.
* compat/gnu/lib/cbl_create_file.3: New file.
* compat/gnu/lib/cbl_delete_file.3: New file.
* compat/gnu/lib/cbl_free_mem.3: New file.
* compat/gnu/lib/cbl_open_file.3: New file.
* compat/gnu/lib/cbl_read_file.3: New file.
* compat/gnu/lib/cbl_write_file.3: New file.
* compat/gnu/udf/cobrt-file-status.cbl: New file.
* posix/cpy/posix-close.cpy: New file.
* posix/cpy/posix-errno.cpy: New file.
* posix/cpy/posix-exit.cpy: New file.
* posix/cpy/posix-fstat.cpy: New file.
* posix/cpy/posix-ftruncate.cpy: New file.
* posix/cpy/posix-localtime.cpy: New file.
* posix/cpy/posix-lseek.cpy: New file.
* posix/cpy/posix-mkdir.cpy: New file.
* posix/cpy/posix-open.cpy: New file.
* posix/cpy/posix-read.cpy: New file.
* posix/cpy/posix-stat.cpy: New file.
* posix/cpy/posix-unlink.cpy: New file.
* posix/cpy/posix-write.cpy: New file.
* posix/shim/fstat.cc: New file.
* posix/udf/posix-close.cbl: New file.
* posix/udf/posix-errno.cbl: New file.
* posix/udf/posix-fstat.cbl: New file.
gcc/testsuite/ChangeLog:
* cobol.dg/group2/37-digit_Initialization_of_fundamental_types.cob:
Updated compiler error message.
* cobol.dg/group2/BINARY_and_COMP-5.cob:
Likewise.
* cobol.dg/group2/Check_for_equality_of_COMP-1___COMP-2.cob:
Likewise.
* cobol.dg/group2/Multi-target_MOVE_with_subscript_re-evaluation.cob:
Likewise.
* cobol.dg/group2/Named_conditionals_-_fixed__float__and_alphabetic.cob:
Likewise.
* cobol.dg/group2/Simple_p-scaling.cob:
Likewise.
* cobol.dg/group2/access_to_OPTIONAL_LINKAGE_item_not_passed.cob:
Likewise.
* cobol.dg/group2/compare_national_to_display.cob:
Likewise.
* cobol.dg/group2/comprensive_compare_comp-1_comp-5.cob:
Likewise.
* cobol.dg/group2/CBL_ALLOC_MEM___CBL_FREE_MEM.cob: New test.
* cobol.dg/group2/CBL_ALLOC_MEM___CBL_FREE_MEM.out: New test.
* cobol.dg/group2/CBL_CHECK_FILE_EXIST.cob: New test.
* cobol.dg/group2/CBL_CHECK_FILE_EXIST.out: New test.
* cobol.dg/group2/CBL_CREATE_FILE___CBL_WRITE_FILE___CBL_CLOSE_FILE.cob: New test.
* cobol.dg/group2/CBL_DELETE_FILE.cob: New test.
* cobol.dg/group2/CBL_DELETE_FILE.out: New test.
* cobol.dg/group2/CBL_OPEN_FILE___CBL_CLOSE_FILE.cob: New test.
* cobol.dg/group2/CBL_OPEN_FILE___CBL_CLOSE_FILE.out: New test.
* cobol.dg/group2/CBL_OPEN_FILE___CBL_READ_FILE___CBL_CLOSE_FILE.cob: New test.
* cobol.dg/group2/CBL_OPEN_FILE___CBL_READ_FILE___CBL_CLOSE_FILE.out: New test.
* cobol.dg/group2/CBL_READ_FILE__check_file_size_with_flags___128.cob: New test.
* cobol.dg/group2/Complex_HEX__VALUE_and_MOVE_-_UTF-16.cob: New test.
* cobol.dg/group2/Complex_HEX__VALUE_and_MOVE_-_UTF-16.out: New test.
* cobol.dg/group2/MOVE_LEVEL_78.cob: New test.
* cobol.dg/group2/MOVE_LEVEL_78.out: New test.
* cobol.dg/group2/add_-1_to_negative_pic_S9999.cob: New test.
* cobol.dg/group2/add_-1_to_negative_pic_S9999.out: New test.
* cobol.dg/group2/add_-1_to_pic_9999.cob: New test.
* cobol.dg/group2/add_-1_to_pic_9999.out: New test.
* cobol.dg/group2/add_-1_to_positive_pic_S9999.cob: New test.
* cobol.dg/group2/add_-1_to_positive_pic_S9999.out: New test.
* cobol.dg/group2/add_1_to_pic_9999.cob: New test.
* cobol.dg/group2/add_1_to_pic_9999.out: New test.
* cobol.dg/group2/add_1_to_positive_pic_S9999.cob: New test.
* cobol.dg/group2/add_1_to_positive_pic_S9999.out: New test.
* cobol.dg/group2/add__1_to_negative_pic_S9999.cob: New test.
* cobol.dg/group2/add__1_to_negative_pic_S9999.out: New test.
* cobol.dg/group2/ambiguous_PERFORM.cob: New test.
* cobol.dg/group2/ambiguous_PERFORM.out: New test.
* cobol.dg/group2/cbltypes.cpy: New test.
* cobol.dg/group2/compare_float_to_other_types.cob: New test.
* cobol.dg/group2/compare_float_to_other_types.out: New test.
* cobol.dg/group2/move_numeric_to_alphanumeric.cob: New test.
* cobol.dg/group2/move_numeric_to_alphanumeric.out: New test.
Tobias Burnus [Fri, 29 May 2026 14:51:18 +0000 (16:51 +0200)]
OpenMP: Reject omp_{cgroup,pteam,thread}_mem_alloc for static vars in ALLOCATE directive [PR122892]
Using omp_{cgroup,pteam,thread}_mem_alloc for static variables was not
very useful as currently worded in the spec; hence, OpenMP 6.1 will
disallow it also for for local static variables, OpenMP 6.0 already
disallowed for other static variables. Cf. OpenMP specification
issue #4665.
For Fortran, the check is modified while for C the check was completely
missing. Both has been rectified by this commit. For C++, the allocate
directive still has to be added.
PR c/122892
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_allocate): Reject
omp_{cgroup,pteam,thread}_mem_alloc for static variables.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_resolve_omp_allocate): Reject
omp_{cgroup,pteam,thread}_mem_alloc also for local static
variables.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Update for removed
plural -S in GOMP_OMP_PREDEF_ALLOC_THREAD.
include/ChangeLog:
* gomp-constants.h (GOMP_OMP_PREDEF_ALLOC_THREADS): Rename to ...
(GOMP_OMP_PREDEF_ALLOC_THREAD): ... this.
(GOMP_OMP_PREDEF_ALLOC_CGROUP, GOMP_OMP_PREDEF_ALLOC_PTEAM): Define
with the value of omp_{cgroup,pteam}_mem_alloc
libgomp/ChangeLog:
* allocator.c (_Static_assert): Add asserts for the values of
GOMP_OMP_PREDEF_ALLOC_CGROUP and GOMP_OMP_PREDEF_ALLOC_PTEAM.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-static-3.f90: Modify to also
disallow local static variables.
* c-c++-common/gomp/allocate-20.c: New test.
Rainer Orth [Fri, 29 May 2026 14:24:40 +0000 (16:24 +0200)]
libgcc: Support -mcall-ms2sysv-xlogues on FreeBSD/x86
With the bulk of the gcc.target/x86_64/abi tests fixed on FreeBSD/amd64,
a couple remain:
FAIL: gcc.target/x86_64/abi/ms-sysv/ms-sysv.c -mcall-ms2sysv-xlogues -O2 "-DGEN_ARGS=-p0\ --omit-rbp-clobbers" (test for excess errors)
and five more. They all fail to link like
gld-2.46: /tmp//ccXprYoX.o: in function `msabi_00_0':
ms-sysv.c:(.text+0x30): undefined reference to `__sse_savms64f_12'
and many more missing symbols. Those are usually provided in libgcc.a
by i386/t-msabi, so this patch includes them on FreeBSD/x86, too. As
with the previous fixes, the resms64*.h and savms64*.h files need to
include .note.GNU-stack, too.
Bootstrapped without regressions on amd64-pc-freebsd15.0 with both gld
and /usr/bin/ld (lld), and x86_64-pc-linux-gnu.
Artemiy Volkov [Thu, 12 Mar 2026 12:21:06 +0000 (12:21 +0000)]
aarch64: add __ARM_FEATURE_ macros for SVE2.2 and SME2.2
This patch defines __ARM_FEATURE_ macros for the SVE2.2 and SME2.2
extensions, together with necessary new tests. In the v1 of the series
(https://gcc.gnu.org/pipermail/gcc-patches/2026-February/707393.html),
this was a part of the first patch, but now it's been moved to the tail
end of the series so that these definitions aren't visible before the
contents of the extensions are actually available.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins):
Emit definitions for __ARM_FEATURE_{SVE,SME}2p2.
The first operand is a multi-vector consisting of two or four vectors, and
the second operand either has the same type, or is a single vector of the
underlying type. New intrinsics are documented in the ACLE manual [0] and
are as follows:
This patch implements the above changes throughout the SVE builtin
description files and aarch64-sve2.md.
[0] https://github.com/ARM-software/acle
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-sve2.def (svmul): Define new
SVE function variant.
* config/aarch64/aarch64-sve2.md (@aarch64_sve_<optab><mode>): New
instruction pattern.
(@aarch64_sve_<optab><mode>_single): Likewise.
* config/aarch64/aarch64.h (TARGET_STREAMING_SME2p2): New macro.
Artemiy Volkov [Sat, 10 Jan 2026 15:16:59 +0000 (15:16 +0000)]
aarch64: implement changes for COMPACT and EXPAND SVE instructions
SVE2.2 and SME2.2 extensions introduce the following changes related to
COMPACT/EXPAND instructions:
- COMPACT (Copy Active vector elements to lower-numbered elements) for 8-
and 16-bit-wide vector elements: these variants of an existing instruction
are new in SVE2.2 (or in streaming mode, SME2.2)
- COMPACT (Copy Active vector elements to lower-numbered elements) for 32-
and 64-bit-wide vector elements: previously only legal in non-streaming
mode, these variants are now allowed in streaming mode under SME2.2
- EXPAND (Copy lower-numbered vector elements to Active elements): this
instruction is new in SVE2.2 (or in streaming mode, SME2.2)
The new supporting intrinsics are documented in the ACLE manual [0] and
are as follows:
This patch implements the above changes throughout the SVE builtin
description files and aarch64-sve{,2}.md.
New ASM tests have been added as usual; also, an adjustment has been made
to aarch64-ssve.exp in g++.target/ to reflect the fact that the svcompact
intrinsic is not nonstreaming-only anymore.
[0] https://github.com/ARM-software/acle
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc (class svexpand_impl):
Define new SVE function base.
* config/aarch64/aarch64-sve-builtins-base.def (svcompact): Allow
execution in streaming mode when SME2p2 is enabled.
* config/aarch64/aarch64-sve-builtins-base.h (svexpand): Declare
new SVE function base.
* config/aarch64/aarch64-sve-builtins-sve2.def (svcompact): Define
new SVE function.
(svexpand): Likewise.
* config/aarch64/aarch64-sve.md (@aarch64_sve_compact<mode>):
Enable 32- and 64-bit element variants under SME2p2. New
insn pattern for 8- and 16-bit elements.
(@aarch64_sve_expand<mode>): New insn pattern.
* config/aarch64/aarch64.h (TARGET_SVE_OR_SME2p2): New macro.
* config/aarch64/aarch64.md (UNSPEC_SVE_EXPAND): New UNSPEC.
The intrinsics are implemented in the usual way; the new
svfirst_lastp_impl base class is used for both families. The ->fold ()
method implements constant folding except for LASTP under
-msve-vector-bits=scalable. On the .md side, the patterns for both new
instructions are implemented using UNSPECs as they can't be expressed in
terms of standard RTL.
Included are standard asm tests (which are heavily based on cntp_* tests
from the sve directory), as well as some general C tests
demonstrating aforementioned optimizations when PG and/or PN are constant
vectors.
[0] https://github.com/ARM-software/acle
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svfirst_lastp_impl): Define new SVE function base class.
(svfirstp): Define new SVE function base.
(svlastp): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def (svfirstp): Define
new SVE function.
(svlastp): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h (svfirstp): Declare
new SVE function base.
* config/aarch64/aarch64-sve2.md (@aarch64_pred_firstp<mode>): New
insn pattern.
(@aarch64_pred_lastp<mode>): Likewise.
* config/aarch64/iterators.md (UNSPEC_FIRSTP): New UNSPEC.
(UNSPEC_LASTP): Likewise.
The implementation of new intrinsics and RTL patterns is quite
straightforward, and a standard set of ASM tests has been added to the
sve2/acle/asm directory.
[0] https://github.com/ARM-software/acle
Changes since v1:
- Append extension names to comments in aarch64-sve2.md.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-sve2.cc (svrint32x): Define
new function base.
(svrint32z): Likewise.
(svrint64x): Likewise.
(svrint64z): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.def (svrint32x):
Define new SVE function.
(svrint32z): Likewise.
(svrint64x): Likewise.
(svrint64z): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.h (svrint32x): Declare
new function base.
(svrint32z): Likewise.
(svrint64x): Likewise.
(svrint32z): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (TYPES_sd_float): New
type set.
(sd_float): New SVE type array.
* config/aarch64/aarch64-sve2.md (@cond_<frintnzs_op><mode>): New
insn pattern.
This patch adds an alternative that emits a single zeroing-predication
form of the instructions mentioned above (as long as the sve2p2_or_sme2p2
condition holds) to corresponding RTL patterns. For narrowing conversions
([B]FCVTNT and FCVTXNT), since an additional merge operand controlling the
values of inactive lanes is required, the intrinsics have been changed to
use the new top_narrowing_convert SVE function base class; this new class
injects a const_vector selector operand at expand time. Depending on the
value of this operand, either the destination vector or a constant zero
vector is used to supply values for inactive lanes.
The new tests all have "_z" in their names since they only cover the
zeroing-predication versions of their respective intrinsics.
[0] https://github.com/ARM-software/acle
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc (class svcvtnt_impl):
Remove.
(svcvtnt): Redefine using narrowing_top_convert.
* config/aarch64/aarch64-sve-builtins-functions.h
(class narrowing_top_convert): New SVE function base class.
(NARROWING_TOP_CONVERT0): New function-like macro for specializing
narrowing_top_convert.
(NARROWING_TOP_CONVERT1): Likewise.
* config/aarch64/aarch64-sve-builtins-sve2.cc (class svcvtxnt_impl):
Remove.
(svcvtxnt): Redefine using narrowing_top_convert.
* config/aarch64/aarch64-sve-builtins-sve2.def (svcvtlt): Allow
zeroing predication.
(svcvtnt): Likewise.
(svcvtxnt): Likewise.
* config/aarch64/aarch64-sve.md (@aarch64_sve_cvtnt<mode>):
Convert to compact syntax. Add operand 4 for values of
inactive lanes. New alternative for zeroing predication.
* config/aarch64/aarch64-sve2.md
(*cond_<sve_fp_op><mode>_relaxed): Convert to compact syntax.
New alternative for zeroing predication.
(*cond_<sve_fp_op><mode>_strict): Likewise.
(@aarch64_sve_cvtnt<mode>): Convert to compact syntax. Add
operand 4 for values of inactive lanes. New alternative for
zeroing predication.
Artemiy Volkov [Fri, 9 Jan 2026 19:30:52 +0000 (19:30 +0000)]
aarch64: add zeroing forms for predicated SVE int-/FP-to-FP conversions
SVE2.2 (or in streaming mode, SME2.2) adds support for zeroing
predication for the following SVE FP conversion instructions:
SVE1:
- SCVTF (Signed integer convert to floating-point (predicated))
- UCVTF (Unsigned integer convert to floating-point (predicated))
- FCVT (Floating-point convert (predicated))
- BFCVT (Single-precision convert to BFloat16 (predicated))
SVE2:
- FCVTX (Double-precision convert to single-precision, rounding to
odd (predicated))
The SVE1 instructions are spread over several patterns for various
combinations of source/destination widths and FP semantics, and the FCVTX
instruction is serviced by two patterns in the aarch64-sve2.md file via
the SVE2_COND_FP_UNARY_NARROWB iterator (one for strict, the other for
relaxed FP semantics). The patch adds an alternative that emits a single
zeroing-predication version of an instruction whenever the merge operand
is a constant zero vector and the sve2p2_or_sme2p2 condition holds.
As with the original cvt_b?f* tests in the sve/acle/asm directory,
testcases for conversions from both integral and floating-point types
coexist in the same files and are grouped only by the destination type.
FCVTX tests are added in a separate file.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_relaxed):
New alternative for zeroing predication. Add `arch` attribute
to every alternative.
(*cond_<optab>_nonextend<SVE_HSDI:mode><SVE_PARTIAL_F:mode>_relaxed):
Likewise.
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_strict):
Likewise.
(*cond_<optab>_extend<VNx4SI_ONLY:mode><VNx2DF_ONLY:mode>):
Likewise.
(*cond_<optab>_trunc<SVE_FULL_SDF:mode><SVE_FULL_HSF:mode>):
Likewise.
(*cond_<optab>_trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>):
Likewise.
(*cond_<optab>_trunc<VNx4SF_ONLY:mode><VNx8BF_ONLY:mode>):
Likewise.
(*cond_<optab>_nontrunc<SVE_FULL_HSF:mode><SVE_FULL_SDF:mode>):
Likewise.
(*cond_<optab>_nontrunc<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>_relaxed):
Likewise.
* config/aarch64/aarch64-sve2.md
(*cond_<sve_fp_op><mode>_any_relaxed): Likewise.
(*cond_<sve_fp_op><mode>_any_strict): Likewise.
Artemiy Volkov [Wed, 14 Jan 2026 13:32:58 +0000 (13:32 +0000)]
aarch64: add zeroing forms for predicated SVE FP-to-integer conversions
SVE2.2 (or in streaming mode, SME2.2) adds support for zeroing predication
for all variants of the following FP-to-integer conversion instructions:
- FCVTZU (Floating-point convert to unsigned integer, rounding toward zero
(predicated))
- FCVTZS (Floating-point convert to signed integer, rounding toward zero
(predicated))
To implement this change, this patch adds a new alternative to patterns
involving the SVE_COND_FCVTI iterator and accepting an independent value
as the merge operand. The new alternative has the new zeroing-predication
forms as the output string and is only enabled when sve2p2_or_sme2p2 is
true in the target architecture.
The new ASM tests only cover the "_z" versions of the intrinsics and as
such all have the "_z" suffix in their name, and are grouped by type of
the destination operand.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_relaxed):
New alternative for zeroing predication. Add `arch` attribute
to every alternative.
(*cond_<optab>_nontrunc<SVE_PARTIAL_F:mode><SVE_HSDI:mode>_relaxed):
Likewise.
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_strict):
Likewise.
(*cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx4SI_ONLY:mode>):
Likewise.
(*cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx2SI_ONLY:mode>_relaxed):
Likewise.
Artemiy Volkov [Fri, 9 Jan 2026 19:05:28 +0000 (19:05 +0000)]
aarch64: add zeroing forms for predicated SVE FP unary operations
SVE2.2 (or in streaming mode, SME2.2) adds support for zeroing predication
for the following floating-point unary instructions:
SVE:
- FABS (Floating-point absolute value (predicated))
- FNEG (Floating-point negate (predicated))
- FRECPX (Floating-point reciprocal exponent (predicated))
- FRINT<r> (Floating-point round to integral value (predicated))
- FSQRT (Floating-point square root (predicated))
SVE2:
- FLOGB (Floating-point base 2 logarithm as integer (predicated))
These instructions are covered by SVE_COND_FP_UNARY for SVE and
SVE2_COND_INT_UNARY_FP for SVE2, thus this change is limited to two
patterns in each of aarch64-sve.md and aarch64-sve2.md (one for relaxed,
and one for strict FP semantics). The change is to add a new alternative
with Dz as operand 3 (the merge operand), enabled only if the
sve2p2_or_sme2p2 condition holds and emitting a single instruction with
zeroing predication.
The tests that have been added are based on the original SVE tests
for corresponding instructions, but all have a "_z" suffix in their name
since they only test codegen for the "_z" variants of the corresponding
intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_any_relaxed):
New alternative for zeroing predication. Add `arch` attribute
to every alternative.
(*cond_<optab><mode>_any_strict): Likewise.
* config/aarch64/aarch64-sve2.md (*cond_<sve_fp_op><mode>):
Likewise.
(*cond_<sve_fp_op><mode>_strict): Likewise.
Artemiy Volkov [Fri, 9 Jan 2026 17:50:22 +0000 (17:50 +0000)]
aarch64: add zeroing forms for predicated SVE bit reversal operations
SVE2.2 (or in streaming mode, SME2.2) adds support for zeroing
predication for the following SVE bit reversal instructions:
- REVB, REVH, REVW (Reverse bytes / halfwords / words within elements
(predicated))
- REVD (Reverse 64-bit doublewords in elements (predicated)) (SVE2 only)
The first three are covered by the SVE_INT_UNARY code iterator, and REVD,
being SVE2-only, has a standalone pattern in aarch64-sve2.md. This patch
adds an alternative for the zeroing-predication forms of the original
instructions. The pattern for REVD also required changes to the predicate
for operand 3 to accept constant zero RTX whenever SVE2.2 is enabled.
Additionally, use the /z form of the REVD instruction for PRED_X
predication to save a data dependency.
The tests that have been added are based on the original SVE/SVE2 tests
for corresponding instructions, but all have a "_z" suffix in their name
since they only test codegen for the "_z" variants of the corresponding
intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (@cond_<optab><mode>):
New alternative for zeroing predication. Add `arch` attribute
to every alternative.
* config/aarch64/aarch64-sve2.md (@aarch64_pred_<optab><mode>):
Use zeroing predication variant for PRED_X.
(@cond_<optab><mode>): Accept constant zero as operand 3. New
alternative for zeroing predication. Add `arch` attribute to
every alternative.
* config/aarch64/predicates.md (aarch64_simd_reg_or_direct_zero):
New predicate.
The functional change is limited to two patterns in aarch64-sve.md
handling SVE extends merging with an independent value, to which this
patch adds a new alternative that emits a single zeroing-predication form
of an instruction as long as the sve2p2_or_sme2p2 condition holds.
The tests that have been added are based on the original SVE tests
for corresponding instructions, but all have a "_z" suffix in their name
since they only test codegen for the "_z" variants of the corresponding
intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(@aarch64_cond_sxt<SVE_FULL_HSDI:mode><SVE_PARTIAL_I:mode>):
New alternative for zeroing predication. Add `arch` attribute
to every alternative.
(*cond_uxt<mode>_any): Likewise.
These instructions are covered by the SVE_INT_UNARY and SVE2_U32_UNARY
iterators, except for CNOT, which has a standalone pattern. Therefore,
three patterns across aarch64-sve.md and aarch64-sve2.md had to be
provided with a new alternative, having Dz (const_vector of all zeroes) as
the merge operand. The new alternatives are conditional upon the
sve2p2_or_sme2p2 test added earlier, and emit the new zeroing-predication
forms of the original instructions.
The tests that have been added are based on the original SVE/SVE2 tests
for corresponding instructions, but all have a "_z" suffix in their name
since they only test codegen for the "_z" variants of the corresponding
intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (*cond_<optab><mode>_any):
New alternative for zeroing predication. Add `arch` attribute
to every alternative.
(*cond_cnot<mode>_any): Likewise.
* config/aarch64/aarch64-sve2.md: (*cond_<sve_int_op><mode>):
Likewise.
Artemiy Volkov [Thu, 12 Mar 2026 12:19:29 +0000 (12:19 +0000)]
aarch64: add preliminary definitions for SVE2.2/SME2.2
This is a preparatory patch for the bulk of the SVE2.2/SME2.2 support
series, putting into place some machinery used by the later patches. This
includes TARGET_* constants that are set based on ISA flags, and new
match_test definitions that are used to enable/disable individual
instruction patterns/alternatives.
On the testsuite side of things, this patch adds two new effective-target
checks in lib/target-supports.exp, one for each of SVE2.2-capable HW and
toolchain.
v1 of this patch also contained __ARM_FEATURE_* macro definitions for
SVE2.2 and SME2.2, but these have been moved to the end of the series to
improve bisection.
gcc/ChangeLog:
* config/aarch64/aarch64.h (TARGET_SVE2p2): New macro.
(TARGET_SME2p2): Likewise.
(TARGET_SVE2p2_OR_SME2p2): Likewise.
* config/aarch64/aarch64.md (arches): Add sve2p2_or_sme2p2 enum
constant.
(arch): Add test for sve2p2_or_sme2p2.
* doc/invoke.texi: Document sve2p2 and sme2p2 extensions.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_aarch64_sve2p2_hw): New target check.
(check_effective_target_aarch64_sve2p2_ok): New target check.
(exts_sve2): Add sme2p2.
LIU Hao [Fri, 29 May 2026 10:17:53 +0000 (12:17 +0200)]
i386: Rewrite index*1+disp into base+disp
Sometimes, GCC may synthesize an address like [index * 1 + displacement]. This
commit rewrites that into [base + displacement], to eliminate the requirement
of an SIB byte (which is always the case, as RSP isn't a valid index), and to
allow a small displacement to be encoded in one byte.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_decompose_address): Add a special case where
there's no base, there's an index, and the scale is 1.
gcc/testsuite/ChangeLog:
* gcc.target/i386/rewrite-sib-without-base.c: New test.
Marc Poulhiès [Fri, 27 Mar 2026 15:29:16 +0000 (16:29 +0100)]
ada: Fix bug when reading multibyte utf-8 character
A multibyte utf-8 character has its msb set, which is the sign bit for a
signed value.
The get_immediate C function, for linux (and others) uses read() when
the character is read from a terminal. It was using a "char" type, so it
can be both signed or unsigned (target dependent). On target where char
is signed, it means that reading a multibyte utf-8 character will
produce a negative value. For example:
€ = 0xE2 0x82 0xAC
The first byte is 0xE2, which is -30 for a signed char.
Then the value is written in a signed int, still as -30 (0xFFFF_FFE2),
and the caller fails a range check because 0xFFFF_FFE2 is not in the
unsigned range for a Character (0..255).
Fixing the variable to an unsigned char avoids the conversion to a
signed value.
gcc/ada/ChangeLog:
* sysdep.c (getc_immediate_common): Read character as unsigned
value.
Bob Duff [Thu, 26 Mar 2026 22:25:58 +0000 (18:25 -0400)]
ada: Cleanup of Analyze_Aspect_Specifications and related code
Rename Decorate to be Decorate_Aspect_Links; seems more readable.
Change it to support N_Attribute_Definition_Clause in addition
to N_Pragma. Move most calls to it into Insert_Aitem.
Move call to Set_Has_Delayed_Rep_Aspects to be near
calls to Set_Has_Delayed_Aspects.
Make Anod and Eloc variables more local to where they are used.
Misc comment improvements, including removing some useless ones.
gcc/ada/ChangeLog:
* sem_ch13.adb (Delay_Aspect): Remove the side effect.
(Decorate): Rename to be Decorate_Aspect_Links.
Generalize.
(Insert_Aitem): Call Decorate_Aspect_Links.
* aspects.ads: Minor comment improvement: we don't need to worry;
we just need to do it.
* einfo.ads: Minor comment improvement.
Gary Dismukes [Tue, 24 Mar 2026 00:40:04 +0000 (00:40 +0000)]
ada: Implement AI22-0154 (Revised resolution of indexing aspects)
Customer code was running into an error due to a violation of the
rule for indexing aspects that any functions declared in the same
package spec that do not satisfy the legality rules for eligible
indexing functions make the aspects illegal. In this case it was
due to a derived type inheriting a function of the parent type
that had indexing aspects. Consideration of this problem led
to proposing language changes in AI22-0154, which revises the
resolution rules to take the indexing profile requirements into
account (rather than allowing resolution indexing aspect names
to consider any available function declared within the scope).
This set of changes implements the revised resolution rules,
allowing the compiler to accept the customer code.
In some cases the compiler will now issue a warning instead of
ignoring an ineligible candidate entity. Specifically this is
done when a candidate interpretation is a function that has at
least a first formal of the type associated with the aspect,
but doesn't satisfy other requirements of the particular
indexing aspect. We impose this limitation so as to avoid
issuing too many false-positive warnings.
These changes also reduce technical debt by removing code in
Sem_Util.Inherit_Nonoverridable_Aspect that was handling checking
and addition of new indexing functions for derived types via calls
to Check_Function_For_Indexing_Aspect. That handling is now covered
fully by Check_Indexing_Functions (which itself makes calls to
Check_Function_For_Indexing_Aspect).
Additionally, these changes attempt to implement rule changes
specified by AI22-0159/01 (Inheritance for aspects allowed to
denote multiple subprograms), an AI that was added to address
problems identified while finalizing AI22-0154.
gcc/ada/ChangeLog:
* sem_ch6.adb (New_Overloaded_Entity): Add missing call to
Check_For_Primitive_Subprogram (Is_Primitive must be set).
* sem_ch13.ads (Check_Function_For_Indexing_Aspect): Move declaration
to package body.
* sem_ch13.adb (Check_Indexing_Functions): Remove early return for
derived types. Pass appropriate values for the new Boolean parameters
on existing calls to Check_Function_For_Indexing_Aspect. Perform a
second interpretation loop, calling Check_Function_For_Indexing_Aspect
and passing Indexing_Found for the Has_Eligible_Func parameter and True
for the Error_On_Ineligible parameter, and remove the existing call
to Error_Msg_NE that was flagging nonlocal entities (a similar error
is now reported inside procedure Check_Function_For_Indexing_Aspect).
Suppress call to Check_Inherited_Indexing in derived type cases.
(Check_Nonoverridable_Aspect_Subprograms): Remove early return when
the aspect spec does not come from source, so aspects of derived types
will also go through this procedure. Check restrictions of AI22-0159/01
for derived types and inheritance of aspects. Replace iteration over
overloaded interpretations with iteration over Aspect_Subprograms (and
only do that for indexing aspects). Condition Sloc for existing error
check for nonprimitive operations based on whether the aspect comes
from source, posting the error on the entity rather than the aspect
if the aspect is not given explicitly.
(Analyze_Aspects_At_Freeze_Point): Split off a new case alternative
for iterator aspects, and specialize treatment for indexing aspects
by forcing a search for new indexing functions. When none are found,
issue an error only in the case where the type has no inherited
indexing functions. Test that the version is at least Ada_2012 rather
than Ada_2022 for calling Check_Nonoverridable_Aspect_Subprograms.
(Check_Function_For_Indexing_Aspect): Move declaration from the package
spec to the body. Add Has_Eligible_Func and Error_On_Ineligible formals
and update spec comment.
Return early if the candidate subprogram was already inherited (present
in Aspect_Subprograms).
For a scope mismatch on Subp, report error only when Has_Eligible_Func
is False and Error_On_Ineligible is True (and never a warning).
Add "<<" in several calls to Report_Ineligible_Indexing_Function
(formerly Illegal_Indexing) to allow either warnings or errors.
Return without adding subprogram to Aspect_Subprograms when
Error_On_Ineligible is False.
(Report_Ineligible_Indexing_Function): Name changed from
Illegal_Indexing.
Return early when only a warning can be issued and the ineligible
subprogram is inherited, or if its first formal (if any) does not match
the aspect's associated type (to reduce false-positive warnings).
Set Error_Msg_Warn based on Error_On_Ineligible formal.
Report a continuation message identifying the ineligible entity.
Remove comment preceding body that has been obviated by AI22-0154.
* sem_util.adb (Inherit_Nonoverridable_Aspect): Remove the loop over
primitives that was checking and adding eligible primitives. That code
was incomplete, and collection of new indexing functions for derived
types is now handled by Check_Indexing_Functions. Also remove the
associated "???" comment.
A recent fix for light runtime configuration was missing a crucial part:
it left an #include directive that needed to be removed. This patch
completes that fix.
Viljar Indus [Wed, 25 Mar 2026 13:18:52 +0000 (15:18 +0200)]
ada: Create a boolean version of Warnings_Suppressed
Add a Boolean overload of Warnings_Suppressed that wraps the existing
String_Id version, simplifying call sites that only need to know whether
warnings are suppressed at a location rather than the suppression reason.
gcc/ada/ChangeLog:
* erroutc.ads (Warnings_Suppressed): New Boolean overload.
* errout.adb (Error_Msg_Internal): Use Boolean Warnings_Suppressed.
* errutil.adb (Error_Msg): Likewise.
Viljar Indus [Wed, 25 Mar 2026 10:57:10 +0000 (12:57 +0200)]
ada: Improve error message insertion methods
Extract the error chain insertion logic into dedicated subprograms.
Insert_Error_Msg adds a new message into the chain and adds the next and
previous pointers, making the deferred Set_Prev_Pointers pass in Finalize
redundant. Find_Msg_Insertion_Point and Is_Before extract the existing
logic for finding the insertion point in Error_Msg_Internal.
gcc/ada/ChangeLog:
* errout.adb (Is_Before): New helper function.
(Find_Msg_Insertion_Point): New procedure.
(Error_Msg_Internal): Use Find_Msg_Insertion_Point and Insert_Error_Msg.
(Finalize): Remove call to Set_Prev_Pointers.
(Set_Prev_Pointers): Removed.
* erroutc.adb (Insert_Error_Msg): New procedure.
* erroutc.ads (Insert_Error_Msg): New declaration.
Andres Toom [Mon, 16 Mar 2026 09:55:09 +0000 (11:55 +0200)]
ada: Do not set Global_Discard_Names in GNATprove_Mode
GNATprove now supports the Image attribute of enumerated types. Hence,
it is important to keep the names of the enumeration literals to be able
to properly reason about them.
gcc/ada/ChangeLog:
* gnat1drv.adb (Adjust_Global_Switches): Do not set
Global_Discard_Names in GNATprove_Mode.
A recent changed added a dependency from the environment-related
functions in argv.c to env.c. This broke some runtime configurations that
provide command line support but no environment variable support.
This fixes the issue by moving all of argv.c's environment-related code
to env.c. It also tweaks a comment in passing.
Bob Duff [Tue, 24 Mar 2026 22:43:16 +0000 (18:43 -0400)]
ada: Do not disable conformance warning in GNAT_Mode
The switch -gnatw_p enables a warning during conformance
checking that is stricter than the standard Ada conformance
rules. This patch removes the test for -gnatg mode when
issuing the warning, because that is redundant -- -gnatg
already turns off -gnatw_p.
We do not want this warning enabled in GNAT sources, but there is
no need to have -gnatg involved explicitly.
The same goes for In_Internal_Unit.
gcc/ada/ChangeLog:
* sem_ch6.adb (Subprogram_Subtypes_Have_Same_Declaration):
Remove tests for In_Internal_Unit and GNAT_Mode.
Marc Poulhiès [Thu, 12 Mar 2026 16:28:18 +0000 (17:28 +0100)]
ada: VAST Check_Entity_Chain
Add Check_Entity_Chain to VAST: checks the Next_Entity/Prev_Entity are
consistent for entity chains.
Currently only checked for entities that are used as Scope.
Fixing existing inconsistencies is not direct.
Any call to Copy_And_Swap creates an incorrect chain, where the new node
has its Prev/Next/First/Last links copied from the original node, but
back links are not changed, leading to something like this for
Copy_And_Swap (Priv, Full):
,----, ,----, ,----, ,----,
| A |------>| B |------>|Priv|---->| D |---> Empty
| |<------| |<------| |<----| |
'----' '----' '----' '----'
^ ^
| ,----, |
`--------|Full|------`
| |
'----'
And then after a while, probably after Exchange_Entities() the links are
incorrect and traversing the chain from First to Last or from Last to
First does not yield the same elements.
gcc/ada/ChangeLog:
* vast.adb (Check_Enum)<Check_Entity_Chain>: Add.
(Status)<Check_Entity_Chain>: Set to Print_And_Continue.
(Check_Entity_Chain): New.
(Check_Scope): Call Check_Entity_Chain.
Viljar Indus [Sat, 21 Mar 2026 01:25:05 +0000 (03:25 +0200)]
ada: Add Delete_Error_And_Continuation_Msgs and refactor duplicate code in errout and errutil
Packages errout and errutil were sharing a lot of code. Extract all of
the common functionality to erroutc.
Extract Delete_Specifically_Suppressed_Warnings and Set_Prev_Pointers.
gcc/ada/ChangeLog:
* errout.adb (Delete_Warning_And_Continuations): use
Delete_Error_And_Continuation_Msgs.
(Output_Messages): Call new refactored subprograms.
(Delete_Specifically_Suppressed_Warnings): New
procedure.
* (Set_Prev_Pointers): New procedure.
* (Finalize): use Delete_Specifically_Suppressed_Warnigns and
Set_Prev_Pointers.
(Finalize): use Delete_Error_And_Continuation_Msgs.
* erroutc.adb (Delete_Error_And_Continuation_Msgs): New procedure.
(Remove_Duplicate_Errors): New_Function.
(Write_All_Errors_In_Brief_Format): New function.
(Write_All_Errors_In_Verbose_Format): New function.
(Write_Error_Summary): New function.
* erroutc.ads (Delete_Error_And_Continuation_Msgs): Likewise.
(Remove_Duplicate_Errors): Likewise.
(Write_All_Errors_In_Brief_Format): Likewise.
(Write_All_Errors_In_Verbose_Format): Likewise.
(Write_Error_Summary): Likewise.
* errutil.adb (Finalize): Call new refactored subprograms.
In most places we only care about whether the warning was suppressed or
not and we never care what the exact reason was. Add a new subprogram
Warning_Is_Suppressed for that purpose.
gcc/ada/ChangeLog:
* errout.adb (Finalize): use Warning_Is_Suppressed.
* erroutc.adb (Warning_Is_Suppressed): New subprogram.
* erroutc.ads (Warning_Is_Suppressed): Likewise.
Viljar Indus [Fri, 20 Mar 2026 15:23:14 +0000 (17:23 +0200)]
ada: Improve dmsg
Add missing attributes to dmsg. Additionally add support for
printing locations and fixes.
gcc/ada/ChangeLog:
* erroutc-pretty_emitter.adb (To_String): Relocated to erroutc.
(To_File_Name): Likewise.
(Line_To_String): Likewise.
(Column_To_String): Likewise.
* erroutc.adb (dedit): New function for debugging edits.
(dfix): New function for debuging fixes.
(dloc): New function for debugging locations.
(dmsg): Print missing Error_Msg_Object attributes.
(To_String): New function for printing spans
(To_String): Relocated from erroutc-pretty_emitter.adb
(To_File_Name): Likewise.
* erroutc.ads: Likewise.
First, a bit of context: Ada has only had support for manipulating
environment variables in the standard library since Ada 2005 and the
introduction of Ada.Environment_Variables.
Prior to that, GNAT had introduced the implementation-specific
Ada.Command_Line.Environment, which still exists today. Until now,
Ada.Command_Line.Environment used a global variable, gnat_envp, which
must be initialized with envp, the optional third parameter to main in C.
When the main was in Ada, the binder generated the appropriate assignment.
The rest of the time, it was the responsibility of the user to write this
assignment. Failure to do so would cause null pointer dereferences when
using Ada.Command_Line.Environment. Although documented in the spec of
Ada.Command_Line, this was rather easy to miss.
Worse, the assignment caused linking failures in the rather common case
of a C GPR project with'ing an Ada GPR project and linking dynamically.
Also, Ada.Command_Line.Environment was inconsistent across platforms with
regard to how it was affected by calls to putenv.
When we added support for the standard Ada.Environment_Variables, the
gnat_envp machinery wasn't reused. Instead, another mechanism based on
the Unix global variable environ (and its close equivalents on other
platforms) was introduced.
What this patch does is switch Ada.Command_Line.Environment over to this
new environ-based mechanism. All uses of gnat_envp are removed, but the
definition itself is kept for backwards compatibility.
gcc/ada/ChangeLog:
* argv-lynxos178-raven-cert.c: Update comments.
* argv.c (gnat_envp): Add comment about it being unused.
(__gnat_env_count, __gnat_len_env, __gnat_fill_env): Use
__gnat_environ instead of gnat_envp.
* bindgen.adb (Command_Line_Used): Update comment.
(Gen_Main): Remove gnat_envp assignment generation. Remove generated
envp parameter.
(Gen_Output_File_Ada): Remove generated envp parameter.
* env.h: Make usable as C++.
* libgnat/a-colien.ads: Remove comment.
* libgnat/a-comlin.ads: Update comment.
* targparm.ads: Update comment.
Piotr Trojanek [Tue, 10 Mar 2026 15:18:13 +0000 (16:18 +0100)]
ada: Require compilation unit to have no indentation
We had a style check for compilation unit to start at column number which is
multiple of indentation value. Now we require compilation units to no have no
indentation.
gcc/ada/ChangeLog:
* par-ch10.adb (P_Compilation_Unit): Require no indentation.
Eric Botcazou [Tue, 17 Mar 2026 21:44:13 +0000 (22:44 +0100)]
ada: Fix compiler crash on primitive completed by expression function
This further restricts the special bypass for the freezing of the profile
in Analyze_Subprogram_Body_Helper to the case of wrapper functions.
gcc/ada/ChangeLog:
PR ada/93702
* exp_ch3.adb (Make_Controlling_Function_Wrappers): Do not set the
Was_Expression_Function flag on the body.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Avoid freezing the
profile only for wrapper functions.
Piotr Trojanek [Fri, 13 Mar 2026 11:54:06 +0000 (12:54 +0100)]
ada: Suppress warning about unused variable in trivial quantification
When condition of a quantification expression is written as True or False, then
the user has likely done this on purpose and there is no need for a warning.
gcc/ada/ChangeLog:
* sem_ch4.adb (Analyze_Quantified_Expression): Suppress warning for
trivial conditions.
Javier Miranda [Fri, 13 Mar 2026 19:48:26 +0000 (19:48 +0000)]
ada: Missing overflow check on Integer_128 under GNATProve mode
Under GNATProve mode the frontend does not generate overflow
checks on type conversions of Universal Integer numbers to
128-bit integer type numbers.
gcc/ada/ChangeLog:
* checks.adb (Apply_Scalar_Range_Check): When the type of the expression
is Universal Integer we cannot statically determine if the expression
is in the range of the target type.
* sem_eval.adb (In_Subrange_Of): Do not consider T2 in the range of
Universal Integer (since theoretically they are not).
(Test_In_Range): Do not consider Universal type expressions in range
of subtype Typ.
Also add a simple consistency check to all routines that dumps the
entity chain: if Prev (Next (E)) /= E (or Next (Prev (E)) /= E in the
reverse order), an extra line is printed:
This example shows that the next links have 2550->2553, but the previous
links have 2700 <- 2553.
gcc/ada/ChangeLog:
* treepr.ads (pech, rpech): New.
(Print_Entity_Chain): Adjust signature and comment to handle
printing only header and doing the simple check.
* treepr.adb (pech, rpech): New.
(Print_Entity_Chain): Support for printing only headers and doing
simple check.
* sem_ch3.adb (Build_Derived_Record_Type): Record type derivations
inherit Is_Unchecked_Union and Has_Unchecked_Union flags.
(Inherit_Component): Add discriminals to the associations list.
* exp_ch3.adb (Build_Record_Init_Proc): Derivations of Unchecked_Union
types don't need an initialization procedure; they reuse the init proc
of their parent type.
Eric Botcazou [Mon, 9 Mar 2026 17:59:11 +0000 (18:59 +0100)]
ada: Distribute declaration of return object into conditional expressions
This lifts one of the limitations of the distribution of a declaration of
an object into the dependent expressions of its initialization expression
when it is a conditional expression, namely the case of the return object
of an extended return statement.
gcc/ada/ChangeLog:
* exp_ch4.adb (Expand_N_Case_Expression): Deal with initialization
expression of return object.
(Expand_N_If_Expression): Likewise.
(Insert_Conditional_Object_Declaration): Likewise.
* exp_util.adb (Is_Distributable_Declaration): Lift limitation for
return objects, including those with a class-wide type.
* sem_ch3.adb (Analyze_Object_Declaration): Set Return_Applies_To
on artificial return objects created from within a transient scope.
Remove test on Expander_Active for better error recovery.
Viljar Indus [Mon, 9 Mar 2026 12:52:33 +0000 (14:52 +0200)]
ada: Calculate the sloc adjustment for inlined static functions
First (and last) node calculation is done by traversing the original
nodes of the given node. This is fine for expanding existing code.
However when inlining static functions this can lead to a node that is
in a completly different location (e.g. the spec) being considered the
first node in the location of the inlined call. This means that in this
type of scenario reseting the slocs is not enough.
The correct approach to use here would be to calculate the Adjustment
in the Source File Index between the function and the inlined call. This
approach is also used in inlining regular subprograms.
Once there is an entry in the Source File Index for the inlined call the
error message mechanism will both highlight the call and the expression
function if an error is present in the inlined call.
gcc/ada/ChangeLog:
* inline.adb (Inline_Static_Function_Call): Add a Source File Index
entry for the call and apply the necessary sloc adjustment values
for all of the inlined nodes.
As I've mentioned on Saturday, the CWG3130 patchset fail to bootstrap.
The problem is that we try to call build_value_init on anonymous unions
or structs, which doesn't work well when they don't have a default
constructor.
Now, if some non-trivial construction is needed, type will already have
either a user-provided constructor or at least non-trivial one, in that
case build_value_init_noctor isn't called at all. So, this patch
just zero-initializes the anonymous aggregate members.
2026-05-29 Jakub Jelinek <jakub@redhat.com>
* init.cc (build_value_init_noctor): Zero initialize anonymous
union/struct subobjects. Formatting fix.
Peter Damianov [Fri, 29 May 2026 06:53:42 +0000 (08:53 +0200)]
libgfortran: Use MapViewOfFileEx instead of MapViewOfFileExNuma in caf_shmem
MapViewOfFileExNuma is only present when _WIN32_WINNT >= 0x0600 (Windows Vista
or later). The code is passing NUMA_NO_PREFERRED_MODE, and that
is documented as:
No NUMA node is preferred. This is the same as calling the MapViewOfFileEx
function.
Sandra Loosemore [Thu, 28 May 2026 22:33:33 +0000 (22:33 +0000)]
Fortran: f_c_string intrinsic improvements
The existing implementation of f_c_string is quite inefficient, doing
either 2 or 3 allocations and copies of the input string prefix. This
rewrite adds folding for constant string arguments and handles other
cases with a single allocation and copy.
This patch also adds the missing documentation for this intrinsic to the
gfortran manual.
gcc/fortran/ChangeLog
* intrinsic.texi (F_C_STRING): New section.
* trans-intrinsic.cc (conv_trim): Delete.
(conv_isocbinding_function): Rewrite the F_C_STRING case.
Before this commit, attempting to use non-ASCII characters in quoted
words failed, even though the protocol allows the usage of such
characters in quoted words. To fix this:
1. Remove `c >= 0x7f` comparison when parsing a quoted word.
2. Use `unsigned char` instead of `char` such that `c < 0x20` fails for
non-ASCII characters.
PR c++/120458
libcody/ChangeLog:
* buffer.cc (S2C): Allow non-ASCII chars in quoted words.
* cody.hh: Use unsigned char for S2C().
gcc/testsuite/ChangeLog:
* g++.dg/README: Explain purpose of modules/ dir.
* g++.dg/modules/pr120458-1_a.C: Define non-ASCII module with
default mapper.
* g++.dg/modules/pr120458-1_b.C: Import non-ASCII module with
default mapper.
* g++.dg/modules/pr120458-2_a.C: Define non-ASCII module with
a file as mapper.
* g++.dg/modules/pr120458-2_b.C: Import non-ASCII module with
a file as mapper.
* g++.dg/modules/pr120458-2.map: Define mapping for pr120458-2
test case.
Roger Sayle [Thu, 28 May 2026 19:56:27 +0000 (20:56 +0100)]
tree-ssa: Loop store motion micro-optimizations.
ref_always_accessed_p is (currently) only ever called with stored_p being
true, so specializing for this case, renaming ref_always_accessed{,_p} to
ref_always_stored{,_p} saves storage and some redundant checks at run-time.
2026-05-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* tree-ssa-loop-im.cc (ref_always_accessed_p): Rename to...
(ref_always_stored_p): New function specialized to determine if
REF is a store that is always executed in LOOP.
(execute_sm): Use ref_always_stored_p instead of
ref_always_accessed_p.
(class ref_always_accessed): Rename to..
(class ref_always_stored): Remove (always true) stored_p field.
(ref_always_stored::operator ()): Always check for a store.
Move hash table lookup, get_lim_data, after store test.
(can_sm_ref_p): Use ref_always_stored_p insead of
ref_always_accessed_p.
Roger Sayle [Thu, 28 May 2026 19:54:17 +0000 (20:54 +0100)]
x86_64 SSE: Tweak/correct STV cost of 128-bit rotate by constant.
This one line change resolves the failure of gcc.target/i386/rotate-2.c
when compiled with -march=cascadelake triggered by recent STV improvements.
https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716996.html
The decision of whether to perform STV is finely balanced, and affected
by the microarchitecture's timings/costs, but in this case the underlying
issue appears to be the parameterized cost for performing a 128-bit
rotation by a constant in SSE registers. Depending upon the number
of bits to rotate by, SSE requires either 1 or 2 shuffles, followed
by a left shift, a right shift and an any_or_plus to combine the result.
This is therefore 4 or 5 instructions, but currently returns
COSTS_N_INSNS(1) instead of COSTS_N_INSNS(4) [probably a typo].
As an aside, it might be more useful for this gain to based on latency;
as both the shuffles and the shifts can each be performed in parallel,
so a reasonable vcost may therefore be COSTS_N_INSNS(3), but such fine
tuning might require microbenchmarking. I mention it here just in case
using COSTS_N_INSNS(4) is bisected as a performance regression.
2026-05-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Tweak
the cost of a 128-bit rotation to be 4 (or 5) instructions.
Roger Sayle [Thu, 28 May 2026 19:50:11 +0000 (20:50 +0100)]
x86_64 SSE: Handle SUBREG conversions in TImode STV (for ptest).
This patch teaches i386's STV pass how to handle SUBREG conversions,
i.e. that a TImode SUBREG can be transformed into a V1TImode SUBREG,
without worrying about other DEFs and USEs.
One example where this is useful is
typedef long long __m128i __attribute__ ((__vector_size__ (16)));
int foo (__m128i x, __m128i y) {
return (__int128)x == (__int128)y;
}
where with -O2 -msse4 we can now scalar-to-vector transform:
i.e. a 128-bit vector doesn't need to be transferred to the
scalar unit to be tested for equality. The new test case includes
additional related examples that show similar improvements.
Previously we explicitly checked *cmpti_doubleword operands to be
either immediate constants, or a TImode REG or a TImode MEM. By
enhancing this to allow a TImode SUBREG, we now handle everything
that would match the general_operand predicate, making this part
of STV more like other RTL passes (lra/reload). The big change is
that unlike a regular DF USE, a SUBREG USE doesn't require us to
analyze and convert the rest of the DEF-USE chain.
2026-05-28 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain::add_insn): Don't
call analyze_register_chain if the USE is a SUBREG.
(scalar_chain::convert_op): Call gen_lowpart to convert
scalar (TImode) SUBREGs to vector (V1TImode) SUBREGs.
(convertible_comparison_p): We can now handle all general_operands
of *cmp<dwi>_doubleword.
(timode_remove_non_convertible_regs): We only need to check TImode
uses that aren't TImode SUBREGs of registers in other modes.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-ptest-7.c: New test case.
Roger Sayle [Thu, 28 May 2026 19:46:04 +0000 (20:46 +0100)]
x86 SSE: Improve vector increment/decrement on x86.
This patch improves the code generated by the i386 backend for incrementing
(adding one to) and decrementing (subtracting one from) a vector. With SSE
materializing the vector -1 is more efficient than materializing the
vector +1, hence x + 1 (increment) is better expressed as x - (-1), and
x - 1 (decrement) is better expressed as x + (-1). Conveniently the
relevant additions and subtractions are specified as a single pattern,
using a plusminus iterator, in the machine description.
uaddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
vpaddb %xmm1, %xmm0, %xmm0
ret
With this patch, we now consistently generate:
sadd1: vpcmpeqd %xmm1, %xmm1, %xmm1
vpsubb %xmm1, %xmm0, %xmm0
ret
uadd1: vpcmpeqd %xmm1, %xmm1, %xmm1
vpsubb %xmm1, %xmm0, %xmm0
ret
saddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
vpaddb %xmm1, %xmm0, %xmm0
ret
uaddm1: vpcmpeqd %xmm1, %xmm1, %xmm1
vpaddb %xmm1, %xmm0, %xmm0
ret
2026-05-28 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (inv_insn): New define_code_attr.
* config/i386/sse.md (<plusminus><mode>3): Accept a CONST_VECTOR
as the second operand. If the second operand is CONST1_RTX,
canonicalize to use CONSTM1_RTX instead.
(*add<mode>3_one): New define_insn_and_split to convert padd +1
to psub -1.
(*sub<mode>3_one): Likewise, a new define_insn_and_split to
convert psub +1 to padd -1.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-simd-1.c: Tweak test case.
* gcc.target/i386/sse2-paddb-2.c: New test case.
* gcc.target/i386/sse2-paddd-2.c: Likewise.
* gcc.target/i386/sse2-paddw-2.c: Likewise.
* gcc.target/i386/sse2-psubb-2.c: Likewise.
* gcc.target/i386/sse2-psubd-2.c: Likewise.
* gcc.target/i386/sse2-psubw-2.c: Likewise.
Marek Polacek [Thu, 28 May 2026 17:43:58 +0000 (13:43 -0400)]
c++: fix infinite looping with arr[arr] [PR125454]
Here r16-3466 moved the canonicalization step that transforms
idx[array] to array[idx] to the beginning of cp_build_array_ref.
When we have array[array], we'll be swapping till we blow the stack.
Previously, we'd give the !INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P
error so there was no problem.
PR c++/125454
gcc/cp/ChangeLog:
* typeck.cc (cp_build_array_ref): Don't recurse for array[array].
Jerry DeLisle [Sat, 23 May 2026 04:56:34 +0000 (21:56 -0700)]
fortran: module-contained PRIVATE procedures must have global ELF linkage [PR125430]
Assisted by: Claude Sonnet 4.6
gcc/fortran/ChangeLog:
PR fortran/125430
* trans-decl.cc (build_function_decl): Set TREE_PUBLIC for all
module-contained procedures so submodules compiled as separate
translation units can reach them via host association. Also set
DECL_VISIBILITY to VISIBILITY_HIDDEN for PRIVATE procedures,
matching the existing treatment of module variables.
gcc/testsuite/ChangeLog:
PR fortran/125430
* gfortran.dg/module_private_2.f90: Remove scan-tree-dump-times
assertion for 'priv'; PRIVATE module procedures now have global
linkage with hidden visibility and are no longer optimized away.
* gfortran.dg/public_private_module_2.f90: Add xfail markers to
scan-assembler-not for 'two' and 'six'; update comment to mention
procedures alongside variables.
* gfortran.dg/public_private_module_7.f90: Add xfail marker to
scan-assembler-not for '__m_common_attrs_MOD_other'.
* gfortran.dg/public_private_module_8.f90: Add xfail marker to
scan-assembler-not for '__m_MOD_myotherlen'.
* gfortran.dg/submodule_private_host.f90: New test.
* gfortran.dg/submodule_private_host_aux.f90: New auxiliary file.
* gfortran.dg/warn_unused_function_2.f90: Remove 'defined but not
used' expectation for s1; PRIVATE module procedures now have
global linkage and no longer trigger the unused-function warning.
Jeff Law [Thu, 28 May 2026 17:36:01 +0000 (11:36 -0600)]
[RISC-V] Fix expected testsuite output after ext-dce changes
The recent changes to ext-dce can transform sign extension to zero extension in
some cases. As a result tests which previously expected a signed load can now
see an unsigned load. Of course on rv32 "lw" loads a full word, so this
doesn't show up there. So instead of looking for "lw" we instead look for
"(lwu|lw)". This fixes the "regressions" after the ext-dce changes.
Patrick Palka [Thu, 28 May 2026 14:39:32 +0000 (10:39 -0400)]
libstdc++: Fix availability of flat_meow::operator=(initializer_list)
This assignment operator was not being brought in from the private base
class causing assignments from {...} to be inefficiently treated as
construction + move assignment.
libstdc++-v3/ChangeLog:
* include/std/flat_map (flat_map): Bring in operator= from
_Flat_map_base.
(flat_multimap): Likewise.
* include/std/flat_set (flat_set): Bring in operator= from
_Flat_set_base.
(flat_multiset): Likewise.
* testsuite/23_containers/flat_map/1.cc (test11): Simplify by
using = {...}.
(test12): New test.
* testsuite/23_containers/flat_multimap/1.cc (test10): Simplify
by using = {...}.
(test11): New test.
* testsuite/23_containers/flat_multiset/1.cc (test10): Simplify
by using = {...}.
(test11): New test.
* testsuite/23_containers/flat_set/1.cc (test10): Simplify by
using = {...}.
(test11): New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Patrick Palka [Thu, 28 May 2026 14:39:29 +0000 (10:39 -0400)]
libstdc++: Implement P3567R2 flat_meow fixes
This implements the changes in sections 5, 6 and 8 of P3567R2; the other
changes (in section 4 and 7) are effectively already implemented.
libstdc++-v3/ChangeLog:
* include/bits/version.def (flat_map): Bump to 202511.
(flat_set): Likewise.
* include/bits/version.h: Regenerate.
* include/std/flat_map (_Flat_map_impl): Remove
is_nothrow_swappable_v assertions.
(_Flat_map_impl::_Flat_map_impl): Explicitly default copy ctor.
Define move ctor with corrected exception handling as per
P3567R2.
(_Flat_map_impl::operator=): Likewise.
(_Flat_map_impl::insert_range): Define new __sorted_t overload
as per P3567R2.
(_Flat_map_impl::swap): Make conditionally noexcept as per
P3567R2.
* include/std/flat_set (_Flat_set_impl): Remove
is_nothrow_swappable_v assertion.
(_Flat_set_impl::_Flat_set_impl): Explicitly default copy ctor.
Define move ctor with correct invariant preserving behavior as
per P3567R2.
(_Flat_set_impl::operator=): Likewise.
(_Flat_set_impl::_M_insert_range): Factored out from
insert_range. Add bool parameter __is_sorted defaulted to
false.
(_Flat_set_impl::insert_range): Define new __sorted_t overload
as per P3567R2.
(_Flat_set_impl::swap): Make conditionally noexcept as per
P3567R2. Correct to use ranges::swap instead of ADL swap.
* testsuite/23_containers/flat_map/1.cc (test11, test12):
New tests.
* testsuite/23_containers/flat_multimap/1.cc (test10, test11):
New tests.
* testsuite/23_containers/flat_multiset/1.cc (test10, test11):
New tests.
* testsuite/23_containers/flat_set/1.cc (test10, test11):
New tests.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Patrick Palka [Thu, 28 May 2026 14:39:27 +0000 (10:39 -0400)]
libstdc++: Fix suboptimal complexity of flat_map::_M_insert
Ever since r16-1742 ranges::inplace_merge is now correctly C++20 iterator
aware which allows us to idiomatically implement this helper with the
correct optimal complexity N + M log M instead of N log N.
libstdc++-v3/ChangeLog:
* include/std/flat_map (_Flat_map_impl::_M_insert): New bool
parameter __is_sorted defaulted to false. Reimplement using
views::zip and ranges::inplace_merge.
(_Flat_map_impl::insert): In the __sorted_t overload, pass
__is_sorted=true to _M_insert.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
REGNO_REG_CLASS (regno) [Macro]
A C expression whose value is a register class containing hard
register regno. In general there is more than one such class;
choose a class which is minimal, meaning that no smaller class
also contains the register.
riscv_regno_to_class[] currently maps every FP hard register to
RVC_FP_REGS, but RVC_FP_REGS only contains f8-f15. The entries for
f0-f7 and f16-f31 therefore violate the "containing hard register
regno" half of the contract: the returned class does not contain the
register at all.
The mismatch corrupts IRA's cost model. setup_allocno_cost_vector
indexes the per-hard-reg cost slot via REGNO_REG_CLASS:
After setup_regno_cost_classes_by_mode adds RVC_FP_REGS to the cost
classes, the cost for e.g. f16 is silently read from the RVC_FP_REGS
slot.
The new fp-reg-class.c testcase puts eight "cf"- and sixteen "f"-
constrained doubles live across a call. In the buggy state IRA
places the cf pseudos outside the cf class and LRA recovers with
sixteen fmv.d to fs* registers; with the fix IRA spills those values
honestly and the IRA "+++Costs" line reports a non-zero "mem"
component.
Fix it by giving each FP hard register its minimal class: FP_REGS for
f0-f7 and f16-f31, RVC_FP_REGS for f8-f15. As a companion change,
switch riscv_secondary_memory_needed from class-equality tests to
reg_class_subset_p so it still recognises the FP side regardless of
which subclass the table returns.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_regno_to_class): Use the minimal
class containing each FP hard register: FP_REGS for f0-f7 and
f16-f31, RVC_FP_REGS for f8-f15.
(riscv_secondary_memory_needed): Use reg_class_subset_p to
detect FP classes.
Zhongyao Chen [Thu, 28 May 2026 11:27:25 +0000 (19:27 +0800)]
RISC-V: Support VLS LMUL cost scaling
Make VLS (fixed-length) vector modes use the same LMUL cost scaling as
VLA modes. This makes the vectorizer to pick smaller LMULs sometimes.
Here is how I update the testsuite which failed in regression test:
- dyn-lmul-conv-[1-2].c: The cost model now prefers smaller LMULs,
so update expectations.
- pr123414.c: This test relies on large LMULs to trigger a specific bug,
can be fixed by adding -fno-vect-cost-model.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc (get_lmul_cost_scaling):
Enable scaling for all vector modes (VLA and VLS).
Tomasz Kamiński [Mon, 25 May 2026 13:15:09 +0000 (15:15 +0200)]
libstdc++: Optimize operator<< for piecewise distributions.
This avoids creating an temporary vector and uses _M_int and _M_den
members of _M_param. Empty _M_int (default) values are handled by
printing values direclty.
libstdc++-v3/ChangeLog:
* include/bits/random.h (piecewise_constant_distribution::param_type)
(piecewise_linear_distribution::param_type): Befriend operator<<.
* include/bits/random.tcc
(operator<<(basic_ostream&, piecewise_linear_distribution))
(operator<<(basic_ostream&, piecewise_constant_distribution)):
Use __x._M_param._M_int and __x._M_param._M_den instead of accessors.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Tomasz Kamiński [Mon, 25 May 2026 12:53:43 +0000 (14:53 +0200)]
libstdc++: Expand serialization test for piecewise distributions.
Due the viariability of the resutls, the test are currently limited
to x86_64 architectures. float/double test are disabled for -m32
as I was getting unstable result.
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/random/piecewise_constant_distribution/operators/serialize2.cc:
New test.
* testsuite/26_numerics/random/piecewise_linear_distribution/operators/serialize2.cc:
New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com> Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
This can be simplified into a single dup instruction going to an SVE
register directly from a scalar (or a smaller vector) value:
mov z0.s, s0
ret
To facilitate this, this patch adds a pattern that combine can use to
merge two vec_duplicate instructions (scalar -> AdvSIMD and AdvSIMD ->
SVE) into a single one (scalar -> SVE).
To demonstrate the effect of this patch, the vec-init-23.c test from
AdvSIMD was reused as a new SVE test (vec_init_5.c).
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(*aarch64_vec_duplicate_subvector<vconsv><vconq><mode>):
New pattern.
* config/aarch64/iterators.md (VCONSV): New mode attribute.
(vconsv): Likewise.
Artemiy Volkov [Thu, 26 Feb 2026 08:45:08 +0000 (08:45 +0000)]
aarch64: implement vec_concat support for sub-64-bit types
This patch improves handling of 2-element vec_concats in
aarch64_vector_init_fallback (); where previously the aarch64_vec_concat
insn was emitted only for pairs of vectors, we now allow scalar operands
as well. Furthermore, if the two operands are the same, we can now emit a
vec_duplicate instead of a vec_concat, leading to better code generation.
This is backed by the new combine{z,_internal}{,_be} insn patterns, that
were each split between integral 16- and 32-bit modes (only involving GPRs
and memory), and the rest (requiring the "w" alternatives as well).
The effect of the changes is illustrated by the changes to vec-init-23.c,
introduced in the previous patch (and a handful of other vector-init
related tests).
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (*aarch64_combine_internal<mode>):
New insn pattern.
(*aarch64_combine_internal_be<mode>): Likewise.
(*aarch64_combinez<mode>): Likewise.
(*aarch64_combinez_be<mode>): Likewise.
(@aarch64_vec_concat<mode>): Support smaller vector and scalar modes.
* config/aarch64/aarch64.cc (aarch64_expand_vector_init_fallback):
Handle the case of two scalar elements.
* config/aarch64/iterators.md (SSUB64): New mode iterator.
(VSSUB64): Likewise.
(VSSUB32_I) : Likewise.
(VSSUB64_F): Likewise.
(VS32_I_SUB64_F): Likewise.
(single_wx): Define attribute for sub-64-bit vector and scalar modes.
(bitsize): Likewise.
(VDBL): Likewise.
(single_dwx): New mode attribute.
Artemiy Volkov [Thu, 26 Feb 2026 09:01:30 +0000 (09:01 +0000)]
aarch64: initialize vectors from starting subsequence
Now that we have 2- and 4-element vector modes for all the sub-word scalar
modes, we can emit more efficient code when the elements of a vector
constructor can be generated from a common starting subsequence of length
power of two. To do this, first detect the shortest possible starting
subsequence by repeatedly folding the initial constructor element array
in half, as long as the left and the right halves are equal. Afterwards,
after emitting the subsequence, duplicate it by generating a
vec_duplicate with the correct source mode.
On the MD side, this requires implementing the vec_duplicate optab to
duplicate an arbitrary sub-128-bit value into a full 64- or a 128-bit
AdvSIMD register, as well as the vec_set insn for the VSUB64 modes (needed
as fallback for the divide-and-conquer approach). The latter uses a
properly scaled and shifted "bfi" for integer values, and a properly
indexed "ins" for FP elements.
This change allows us to get rid of long chains of inserts and compile
things like:
int16x8_t f (int16_t x, int16_t y, int16_t z, int16_t w)
{
return (int16x8_t) {x, y, z, w, x, y, z, w};
}
Artemiy Volkov [Mon, 18 May 2026 10:21:18 +0000 (10:21 +0000)]
aarch64: introduce partial AdvSIMD vector modes
In addition to V2HF that already exists, this patch adds 4 more partial
(16- and 32-bit) AdvSIMD vector modes: V4QI, V2QI, V2HI, and V2BF. For
now, these are intended only for duplication into full-sized (32-, 64-,
and 128-bit) registers. As a minimal closure required to bootstrap the
compiler, this also implements the "mov" expand and the "aarch64_simd_mov"
insn_and_split for the new modes (gathered under the VSUB64 iterator).
This patch also adds the new aarch64_advsimd_sub_dword_mode_p () helper to
facilitate detecting the new modes; that is then used (a) to disable
vec_perm_const vectorization for those modes, (b) in the "mov" expander
for those modes, and (c) to define the new "Da" constraint.
Some existing testcases were adjusted where needed. (The _Float16
testcase in sve/slp_1.c temporarily expects GPRs to be used for V2HF,
which is corrected to FPRs by the succeeding patch; and the half-float
complex tests now recognize some of the patterns, but check that V2BF
still can't be used for vectorization.)
gcc/ChangeLog:
* config/aarch64/aarch64-modes.def (VECTOR_MODE): Remove V2HF.
(VECTOR_MODES): Define V2QI, V4QI, V2HI, V2HF, V2BF.
* config/aarch64/aarch64-protos.h
(aarch64_advsimd_sub_dword_mode_p): Declare new predicate.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>): New
define_insn_and_split pattern.
(mov<mode>): Add sub-64-bit vector modes to the VALL_F16 expander.
Forego const vector expansion for those modes.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
Handle 16- and 32-bit vector modes.
(aarch64_advsimd_sub_dword_mode_p): Define new predicate.
(aarch64_vectorize_vec_perm_const): Refuse for partial vector modes.
* config/aarch64/constraints.md (Da): New constraint.
* config/aarch64/iterators.md (VSUB64): New iterator.
(VALL_F16_SUB64): Likewise.
(size): Define attribute for sub-64-bit vector modes.
(VSC): New mode attribute.
(vstype): Likewise.
Kewen Lin [Thu, 28 May 2026 11:22:57 +0000 (11:22 +0000)]
i386: Refine c86-4g fdiv scheduling model
Commit r17-258 introduced separated c86-4g fdiv units to avoid the
automaton explosion caused by modeling the whole divider latency on
normal FPU pipes. But the real hardware may keep the associated FPU
pipe occupied for some cycles at both the beginning and the end of
an fdiv or sqrt operation. Following Alexander's suggestion in [1],
this patch still keeps the long-latency part on the dedicated fdiv
unit but models only a bounded part of the FPU pipe occupancy. It
makes the first four cycles reserve both the selected FPU pipe and
the fdiv unit, then keep only the fdiv unit for the remaining cycles.
Taking r17-258 as baseline, I tried K = 1,2,3,4 for
fpu,divider*N -> (fpu+divider)*K, divider*(N-K)
and measured the time for build/genautomata and the top 100 symbol
sizes of insn-automata.o (baseline normalized as 100) as below:
1) without any other changes:
time size
baseline 100 100
r17-203 340.0 629.3
K1 100.3 100
K2 105.5 112.5
K3 112.8 129
K4 119.4 141
2) Splitting fpu0/fpu2 and fpu1/fpu3 to paired automatons:
time size
baseline 100 100
r17-203 340.0 629.3
KS1 79.6 43.3
KS2 79.8 43.3
KS3 79.6 43.3
KS4 79.4 43.3
It turns out that if we want to model the FPU occupancy for some
beginning cycles, separating the involved fpu1/fpu3 from the
original fpu looks better. So this patch splits fpu0/fpu2 and
fpu1/fpu3 into two paired automata and this extra coupling does
not grow the main FPU automata significantly.
This patch also corrects some other modeling omissions like:
- Fix c86_4g_fp_op_idiv_load latency typo by one cycle.
- Merge the old c86_4g_m7 idiv DI/SI/HI reservations after
aligning their latency and divider unit occupancy (with
updated values), while keeping QI separate.
- Adjust reservation units in templates like
c86_4g_m7_avx_vpinsr_reg_load and c86_4g_m7_avx512_sseadd_xy
etc.
- Add missing reservation units and unit occupancy in templates
like c86_4g_m7_avx512_permi2_ymm and
c86_4g_m7_sse_sseiadd_hplus_load etc.
- Adjust reservation units and unit occupancy in templates like
c86_4g_m7_avx512_perm_zmm_imm, c86_4g_m7_avx512_expand and
c86_4g_m7_avx512_ssemul etc.
And also introduces some reusable reservation aliases to simplify
some modelings.
I tested build time for i686 bootstrapping in a docker container:
- r17-202: 2437s (before c86-4g support)
- r17-203: 7291s (c86-4g support)
- r17-258: 2646s (tweaking for build time)
- this: 2358s
It looks this patch improves build time (even better than r17-202
though the trivial gap can be due to some jitter).
The symbol sizes are improved as below:
nm -CS -t d --defined-only gcc/insn-automata.o \
| sed 's/^[0-9]* 0*//' \
| sort -n | tail -20
with r17-258:
20068 r bdver1_fp_transitions
22354 r c86_4g_m7_ieu_min_issue_delay
26208 r slm_min_issue_delay
26580 t internal_min_issue_delay(int, DFA_chip*)
26869 t internal_state_transition(int, DFA_chip*)
27244 r bdver1_fp_min_issue_delay
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
33728 r c86_4g_fp_transitions
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
89414 r c86_4g_m7_ieu_transitions
93960 r bdver3_fp_transitions
181744 r znver4_fpu_transitions
326322 r c86_4g_m7_fpu_min_issue_delay 1305288 r c86_4g_m7_fpu_transitions
with this:
17872 r print_reservation(_IO_FILE*, rtx_insn*)::...
20068 r bdver1_fp_check
20068 r bdver1_fp_transitions
22016 r c86_4g_m7_fpu02_transitions
22354 r c86_4g_m7_ieu_min_issue_delay
26208 r slm_min_issue_delay
27244 r bdver1_fp_min_issue_delay
28199 t internal_min_issue_delay(int, DFA_chip*)
28362 t internal_state_transition(int, DFA_chip*)
28518 r glm_check
28518 r glm_transitions
33690 r geode_min_issue_delay
45436 r znver4_fpu_min_issue_delay
46980 r bdver3_fp_min_issue_delay
49428 r glm_min_issue_delay
53730 r btver2_fp_min_issue_delay
53760 r znver1_fp_transitions
89414 r c86_4g_m7_ieu_transitions
93960 r bdver3_fp_transitions
181744 r znver4_fpu_transitions
Based on random sampling of SPEC2017 benchmarks 525.x264_r and
521.wrf_r, I verified that the new modeling introduces no
significant compilation overhead. Testing with a single job on a
c86-4g-m7 machine revealed no impact on x264 and a tiny increase
for wrf (~0.3%).
Zhongyao Chen [Wed, 20 May 2026 09:30:22 +0000 (17:30 +0800)]
RISC-V: Add RISC-V RVV main-loop overhead comparison in cost model
Add an RVV-specific loop-overhead comparison in the RISC-V cost model and
use it after inside-loop cost comparison.
The RISC-V implementation prefers RVV mode that eliminate the main
loop, and otherwise compares their main-loop head overhead.
Local testing shows no regressions. This is likely because few testcases
have equal inside-loop cost, especially before VLS lmul cost scaling support.
I also ran regression tests with temporary VLS lmul cost scaling support.
Only 3 regressions found:
- dyn-lmul-conv-1.c & dyn-lmul-conv-2.c: Cost model now prefers smaller LMULs
due to VLS lmul scaling, so this is reasonable, just need to update expectations.
- pr123414.c: This test relies on large LMULs to trigger a specific bug,
so reasonable too, can be fixed by adding -fno-vect-cost-model.
The VLS LMUL cost scaling patch will be updated after this is pushed.
gcc/ChangeLog:
* config/riscv/riscv-vector-costs.cc
(estimated_loop_iters): New function.
(compare_loop_overhead): New function.
(costs::better_main_loop_than_p): Compare RVV loop overhead after
inside-loop cost.
Alex Coplan [Wed, 27 May 2026 20:26:44 +0000 (21:26 +0100)]
aarch64: Make more use of UINTVAL
I noticed while reviewing some other code that we have existing code of
the form (unsigned HOST_WIDE_INT) INTVAL (X). Such expressions are (by
definition of UINTVAL) equivalent to UINTVAL (x), and the latter is both
more succint and (IMO) more readable, so this patch replaces those
instances in the aarch64 backend accordingly.
There are also many occurrences of this outside of aarch64, I see:
Georg-Johann Lay [Thu, 28 May 2026 09:44:21 +0000 (11:44 +0200)]
AVR: Support [[len=<words]] notes in inline asm to specifty its size.
This patch adds support for [[len=<words>]] in (the comments of) inline
asm constructs. It serves several purposes:
- Cases where the expanded asm is longer than determined from the number
of physical and logical line breaks. Such cases can lead to errors
when a jump that uses a too optimistic jump offset is crossing an asm.
- Better code generation for jumps that are crossing an asm. The default
length of an asm is (1 + NL) * 2 words, where NL denotes the sum of
physical and logical line breaks. However, almost all AVR instructions
occupy only one 16-bit word.
The feature is implemented in ADJUST_INSN_LENGTH. The length of
an asm is the sum over all [[len=<words>]] notes, except when an
unrecognized construct is found or an error occurred. In the latter
case, the default insn length is used. These <words> are supported:
<words> = [0-9]+
Specifies a non-negative decimal integer.
<words> = %[0-9]+
<words> = %[<name>] # Already resolved to %[0-9]+ by the middle-end.
Refers to the respective asm operand, which must be CONST_INT.
<words> = lds
<words> = sts
Specifies the length of a LDS or STS instruction, i.e.
1 word if AVR_TINY, and 2 words otherwise.
<words> = %~
<words> = %~call
<words> = %~jmp
Specifies the length of a %~call resp. %~jmp instruction, i.e.
2 words if AVR_HAVE_JMP_CALL, and 1 word otherwise.
In order to observe the assigned lengths, see -fdump-rtl-shorten or the
";; ADDR = ..." insn addresses in the asm output with -mlog=insn_addresses.
The benefits of using magic comments are:
- The feature is backwards compatible, and the target code can use
the same asm syntax since only asm comments have to be adjusted.
No #ifdef feature test macros are needed. The only case where the
feature is not fully backwards compatible is when asm templates
already contain invalid "[[len=" notes for some reason. In that
case, -mno-asm-len-notes restores the old behavior.
- Since the asm size is the sum over all notes, the final size can
be stitched together from multiple annotations / parts of an asm
template, and there is no need to support operations like plus.
gcc/
* config/avr/avr.cc (avr_read_number, avr_length_of_asm)
(avr_maybe_length_of_asm): New static functions.
(avr_adjust_insn_length): Call avr_maybe_length_of_asm on
unrecognized insns.
* config/avr/avr.opt (-masm-len-notes, -Wasm-len-notes): New
options.
* doc/invoke.texi (AVR Options): Add -masm-len-notes,
-Wasm-len-notes.
* doc/extend.texi (Size of an asm): Add @subsubheading
"Specifying the size of an asm on AVR".
libgcc/config/avr/libf7/
* libf7.h: Add "[len=...]]" notes to all non-empty inline asm's.
* libf7.c: Dito.
Jakub Jelinek [Thu, 28 May 2026 08:28:12 +0000 (10:28 +0200)]
i386: Fix up *add<mode>_1<nf_name> [PR125469]
The following testcase ICEs, because combine matches
(set (reg:DI 108) (plus:DI (reg:DI 104 [ s ]) (subreg:DI (reg:TI 103 [ _2 ]) 8)))
Now, because ix86_validate_address_register has:
12038 /* Don't allow SUBREGs that span more than a word. It can
12039 lead to spill failures when the register is one word out
12040 of a two word structure. */
12041 if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
12042 return NULL_RTX;
this isn't recognized as *leadi, but is recognized as *adddi_1_nf pattern
instead. Now, later on the RA turns it into:
(set (reg:DI 2 cx [108]) (plus:DI (reg:DI 0 ax [orig:104 s ] [104]) (reg:DI 5 di [ _2+8 ])))
which would be valid *leadi, but given that INSN_CODE is already set to the
*adddi_1_nf and that also satisfies it, nothing re-recognizes it as *leadi.
But in that case without TARGET_APX_NDD the pattern has return "#";
That is a bug, because there is no splitter to split that
(set (reg:DI 2 cx [108]) (plus:DI (reg:DI 0 ax [orig:104 s ] [104]) (reg:DI 5 di [ _2+8 ])))
into itself so that it is re-recognized as *leadi, so it just ICEs.
I think having a splitter to split to the same thing would be just weird, so
this just outputs lea insn directly.
2026-05-28 Jakub Jelinek <jakub@redhat.com>
PR target/125469
* config/i386/i386.md (*add<mode>_1<nf_name>): Don't return "#" for
the lea non-TARGET_APX_NDD case, instead emit a lea directly.