git.ipfire.org Git - thirdparty/gcc.git/log

fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]

Previously effective target fortran_real_c_float128 never
passes on Power regardless of the default 128 long double
is ibmlongdouble or ieeelongdouble.  It's due to that TF
mode is always used for kind 16 real, which has precision
127, while the node float128_type_node for c_float128 has
128 type precision, get_real_kind_from_node can't find a
matching as it only checks gfc_real_kinds[i].mode_precision
and type precision.

With changing TFmode/IFmode/KFmode to have the same mode
precision 128, now fortran_real_c_float12 can pass with
ieeelongdouble enabled by default and test cases guarded
with it get tested accordingly.  But with ibmlongdouble
enabled by default, since TFmode has precision 128 which
is the same as type precision 128 of float128_type_node,
get_real_kind_from_node considers kind for TFmode matches
float128_type_node, but it's wrong as at this time point
TFmode is with ibm extended format.  So this patch is to
teach get_real_kind_from_node to check one more field which
can be differentiable from the underlying real format, it
can avoid the unexpected matching when there more than one
modes have the same precisoin.

PR target/112993

gcc/fortran/ChangeLog:

* trans-types.cc (get_real_kind_from_node): Consider the case where
more than one modes have the same precision.

rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode.  With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128.  But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.

This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode.  Besides, build_common_tree_nodes
adopts the newly added hook mode_for_floating_type so we
don't need to worry about unexpected mode for long double
type node.

In function convert_mode_scalar, with the proposed change,
it adopts sext_optab for converting ieee128 format mode to
ibm128 format mode while trunc_optab for converting ibm128
format mode to ieee128 format mode.  Thus this patch removes
useless extend and trunc optab supports, supplements new
define_expands expandkftf2 and trunctfkf2 to align with
convert_mode_scalar implementation.  It also unnames two
define_insn_and_split to avoid conflicts and make them more
clear.  Considering the current implementation that there is
no chance to have KF <-> IF conversion (since either of them
would be TF already), it adds two dummy define_expands to
assert this.

[1] https://inbox.sourceware.org/gcc-patches/
    718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/

PR target/112993

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(rs6000_c_mode_for_floating_type): Likewise.
* config/rs6000/rs6000.md (define_expand extendiftf2): Remove.
(define_expand extendifkf2): Remove.
(define_expand extendtfkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand extendtfif2): Add new assertion.
(define_expand expandkftf2): New.
(define_expand trunciftf2): Add new assertion.
(define_expand trunctfkf2): New.
(define_expand truncifkf2): Change with gcc_unreachable.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to  ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.

expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format

With some historical reasons, rs6000 defines KFmode, TFmode
and IFmode to have different mode precisions, but it causes
some issues and needs some workarounds such as PR112993.
So we are going to make all rs6000 128 bit scalar FP modes
have 128 bit precision.  Be prepared for that, this patch
is to make function convert_mode_scalar allow same precision
FP modes conversion if their underlying formats are
ibm_extended_format and ieee_quad_format respectively, just
like the existing special treatment on arm_bfloat_half_format
<-> ieee_half_format.  It also factors out all the relevant
checks into a lambda function.  Besides, similar to ieee fp16
-> bfloat conversion, it adopts trunc_optab rather than
sext_optab for ibm128 to ieee128 conversion.

PR target/112993

gcc/ChangeLog:

* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes.  Use
trunc_optab rather than sext_optab for ibm128 to ieee128 conversion.
* optabs-libfuncs.cc (gen_trunc_conv_libfunc): Use trunc_optab rather
than sext_optab for ibm128 to ieee128 conversion.

libbacktrace: update xcoff.c for base_address changes

* xcoff.c (struct xcoff_fileline_data): Change base_address field
to struct libbacktrace_base_address.
(xcoff_initialize_syminfo): Change base_address to struct
libbacktrace_base_address. Use libbacktrace_add_base.
(xcoff_initialize_fileline): Likewise.
(xcoff_lookup_pc): Use libbacktrace_add_base.
(xcoff_add): Change base_address to struct
libbacktrace_base_address.
(xcoff_armem_add, xcoff_add_shared_libs): Likewise.
(backtrace_initialize): Likewise.
* Makefile.am (xcoff.lo): Remove unused target.
(xcoff_32.lo, xcoff_64.lo): New targets.
* Makefile.in: Regenerate.

rs6000: Error on CPUs and ABIs that don't support the ROP protection insns [PR114759]

We currently silently ignore the -mrop-protect option for old CPUs we don't
support with the ROP hash insns, but we throw an error for unsupported ABIs.
This patch treats unsupported CPUs and ABIs similarly by throwing an error
both both. This matches clang behavior and allows us to simplify our tests
in the code that generates our prologue and epilogue code.

2024-06-26 Peter Bergner <bergner@linux.ibm.com>

gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Disallow
CPUs and ABIs that do no support the ROP protection insns.
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Remove now
unneeded tests.
(rs6000_emit_prologue): Likewise.
Remove unneeded gcc_assert.
(rs6000_emit_epilogue): Likewise.
* config/rs6000/rs6000.md: Likewise.

gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-3.c: New test.

rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]

We currently only emit the ROP-protect hash* insns for Power10, where the
insns were added to the architecture.  We want to emit them for earlier
cpus (where they operate as NOPs), so that if those older binaries are
ever executed on a Power10, then they'll be protected from ROP attacks.
Binutils accepts hashst and hashchk back to Power8, so change GCC to emit
them for Power8 and later.  This matches clang's behavior.

2024-06-19  Peter Bergner  <bergner@linux.ibm.com>

gcc/
PR target/114759
* config/rs6000/rs6000-logue.cc (rs6000_stack_info): Use TARGET_POWER8.
(rs6000_emit_prologue): Likewise.
* config/rs6000/rs6000.md (hashchk): Likewise.
(hashst): Likewise.
Fix whitespace.

gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-2.c: New test.
* lib/target-supports.exp (rop_ok): Use
check_effective_target_has_arch_pwr8.

c++/modules: Propagate BINDING_VECTOR_*_DUPS_P on realloc [PR99242]

When importing modules, when a binding vector for a name runs out of
slots it gets reallocated with a larger size, and existing bindings are
copied across. However, the flags to indicate whether deduping needs to
occur did not: this causes ICEs, as it allows a duplicate binding to be
added which then violates assumptions later on.

PR c++/99242

gcc/cp/ChangeLog:

* name-lookup.cc (append_imported_binding_slot): Propagate dups
flags.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99242_a.H: New test.
* g++.dg/modules/pr99242_b.H: New test.
* g++.dg/modules/pr99242_c.H: New test.
* g++.dg/modules/pr99242_d.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

Daily bump.

range-ops should return the requested boolean type.

The pointer based relation operator's fold_range () routines should
return a boolean range with the requested type, not the default type.

PR tree-optimization/115951
* range-op-ptr.cc (operator_equal::fold_range): Return a boolean
range with the requested type.
(operator_not_equal::fold_range): Likewise.
(operator_lt::fold_range): Likewise.
(operator_le::fold_range): Likewise.
(operator_gt::fold_range): Likewise.
(operator_ge::fold_range): Likewise.

c++/contracts: ICE in C++ Contracts with '-fno-exceptions' [PR 110159]

We currently only initialise terminate_fn if exceptions are enabled.
However, contract handling requires terminate_fn when building the
contract because a contract failure may result in std::terminate call
regardless of whether the exceptions are enabled. Refactored
init_exception_processing to extract the initialisation of
terminate_fn. New function init_terminate_fn added that initialises
terminate_fn if it hasn't already been initialised. Call to terminate_fn
added in cxx_init_decl_processing if contracts are enabled.

PR c++/110159

gcc/cp/ChangeLog:

* cp-tree.h (init_terminate_fn): Declaration of a new function.
* decl.cc (cxx_init_decl_processing): If contracts are enabled,
call init_terminate_fn.
* except.cc (init_exception_processing): Function refactored to
call init_terminate_fn.
(init_terminate_fn): Added new function that initializes
terminate_fn if it hasn't already been initialised.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr110159.C: New test.

Signed-off-by: Nina Ranns <dinka.ranns@gmail.com>

AVR: testsuite - Attribute ipa implies noinline and noclone.

gcc/testsuite/
* gcc.target/avr/isr-test.h: Attribute ipa implies noinline and noclone.
* gcc.target/avr/pr114981-powif.c: Same.
* gcc.target/avr/pr114981-powil.c: Same.
* gcc.target/avr/pr71676-1.c: Same.
* gcc.target/avr/pr71676-2.c: Same.
* gcc.target/avr/pr71676-3.c: Same.
* gcc.target/avr/pr71676.c: Same.
* gcc.target/avr/torture/add-extend.c: Same.
* gcc.target/avr/torture/fix-types.h: Same.
* gcc.target/avr/torture/fuse-add.c: Same.
* gcc.target/avr/torture/get-mem.c: Same.
* gcc.target/avr/torture/insv-anyshift-hi.c: Same.
* gcc.target/avr/torture/insv-anyshift-si.c: Same.
* gcc.target/avr/torture/isr-02-call.c: Same.
* gcc.target/avr/torture/isr-03-fixed.c: Same.
* gcc.target/avr/torture/pr109650-1.c: Same.
* gcc.target/avr/torture/pr109650-2.c: Same.
* gcc.target/avr/torture/pr109907-1.c: Same.
* gcc.target/avr/torture/pr109907-2.c: Same.
* gcc.target/avr/torture/pr114132-2.c: Same.
* gcc.target/avr/torture/pr39633.c: Same.
* gcc.target/avr/torture/pr51782-1.c: Same.
* gcc.target/avr/torture/pr61055.c: Same.
* gcc.target/avr/torture/pr61443.c: Same.
* gcc.target/avr/torture/pr64331.c: Same.
* gcc.target/avr/torture/pr77326.c: Same.
* gcc.target/avr/torture/pr83729.c: Same.
* gcc.target/avr/torture/pr83801.c: Same.
* gcc.target/avr/torture/pr87376.c: Same.
* gcc.target/avr/torture/pr88236-pr115726.c: Same.
* gcc.target/avr/torture/pr92606.c: Same.
* gcc.target/avr/torture/pr98762.c: Same.
* gcc.target/avr/torture/sat-hr-plus-minus.c: Same.
* gcc.target/avr/torture/sat-k-plus-minus.c: Same.
* gcc.target/avr/torture/sat-llk-plus-minus.c: Same.
* gcc.target/avr/torture/sat-r-plus-minus.c: Same.
* gcc.target/avr/torture/sat-uhr-plus-minus.c: Same.
* gcc.target/avr/torture/sat-uk-plus-minus.c: Same.
* gcc.target/avr/torture/sat-ullk-plus-minus.c: Same.
* gcc.target/avr/torture/sat-ur-plus-minus.c: Same.
* gcc.target/avr/torture/set-mem.c: Same.
* gcc.target/avr/torture/sub-extend.c: Same.
* gcc.target/avr/torture/tiny-progmem.c: Same.

c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].

The current implementation of contracts emits the checks into function
bodies in three places; for pre-conditions at the start of the body,
for asserts in-line in the function body and for post-conditions as an
addition to return statements.

In general (at least with existing "2a" contract semantics) the in-line
contract asserts behave as expected.

However, the mechanism is not applicable to:

* Handling pre conditions in coroutines since, for those, the standard
  specifies a wrapping of the original function body by functionality
  implementing initial and final suspends (along with some housekeeping
  to route exceptions).  Thus for such transformed function bodies, the
  preconditions then get actioned after the initial suspend, which does
  not behave as intended.

  * Handling post conditions in functions that do not have return
    statements (which applies to coroutines and void functions).

In the following, we identify a potentially transformed function body
(in the case of coroutines, this is usually called the "ramp()" function).

The patch here re-implements the code insertion in one of the two
following ways (code for exposition only):

  * For functions with no post-conditions we wrap the potentially
    transformed function as follows:

  {
     handle_pre_condition_checking ();
     potentially_transformed_function_body ();
  }

  This implements the intent that the preconditions are processed after
  the function parameters are initialised but before any other actions.

  * For functions with post-conditions:

  if (preconditions_exist)
    handle_pre_condition_checking ();
  try
   {
     potentially_transformed_function_body ();
   }
  finally
   {
     handle_post_condition_checking ();
   }
  else [only if the function is not marked noexcept(true) ]
   {
     ;
   }

In this, post-conditions [that might apply to the return value etc.]
are evaluated on every non-exceptional edge out of the function.

At present, the model here is that exceptions thrown by the function
propagate upwards as if there were no contracts present.  If the desired
semantic becomes that an exception is counted as equivalent to a contract
violation - then we can add a second handler in place of the empty
statement.

This patch specifically does not address changes to code-gen and constexpr
handling that are contained in P2900.

PR c++/115434
PR c++/110871
PR c++/110872

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Handle EH_ELSE_EXPR.
* contracts.cc (finish_contract_attribute): Remove excess line.
(build_contract_condition_function): Post condition handlers are
void now.
(emit_postconditions_cleanup): Remove.
(emit_postconditions): New.
(add_pre_condition_fn_call): New.
(add_post_condition_fn_call): New.
(apply_preconditions): New.
(apply_postconditions): New.
(maybe_apply_function_contracts): New.
(apply_postcondition_to_return): Remove.
* contracts.h (apply_postcondition_to_return): Remove.
(maybe_apply_function_contracts): Add.
* coroutines.cc (coro_build_actor_or_destroy_function): Do not
copy contracts to coroutine helpers.
* decl.cc (finish_function): Handle wrapping a possibly
transformed function body in contract checks.
* typeck.cc (check_return_expr): Remove handling of post
conditions on return expressions.

gcc/ChangeLog:

* gimplify.cc (struct gimplify_ctx): Add a flag to show we are
expending a handler.
(gimplify_expr): When we are expanding a handler, and the body
transforms might have re-written DECL_RESULT into a gimple var,
ensure that hander references to DECL_RESULT are also re-written
to refer to the gimple var.  When we are processing an EH_ELSE
expression, then add it if either of the cleanup slots is in
use.

gcc/testsuite/ChangeLog:

* g++.dg/contracts/pr115434.C: New test.
* g++.dg/coroutines/pr110871.C: New test.
* g++.dg/coroutines/pr110872.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

AVR: testsuite - Add noipa function attribute to noclone functions.

Many functions under test have the noinline and noclone function
attributes attached so that no (constant) values are propagated
into the functions, so that we actually are testing what's supposed
to be tested. In order to enforce that, noipa may also be required
when inter-procedural analysis / optimizations are on.

gcc/testsuite/
* gcc.target/avr/isr-test.h: Add noipa function attribute
to noclone functions.
* gcc.target/avr/pr114981-powif.c: Same.
* gcc.target/avr/pr114981-powil.c: Same.
* gcc.target/avr/pr71676-1.c: Same.
* gcc.target/avr/pr71676-2.c: Same.
* gcc.target/avr/pr71676-3.c: Same.
* gcc.target/avr/pr71676.c: Same.
* gcc.target/avr/torture/fix-types.h: Same.
* gcc.target/avr/torture/fuse-add.c: Same.
* gcc.target/avr/torture/get-mem.c: Same.
* gcc.target/avr/torture/insv-anyshift-hi.c: Same.
* gcc.target/avr/torture/insv-anyshift-si.c: Same.
* gcc.target/avr/torture/isr-02-call.c: Same.
* gcc.target/avr/torture/isr-03-fixed.c: Same.
* gcc.target/avr/torture/pr109650-1.c: Same.
* gcc.target/avr/torture/pr109650-2.c: Same.
* gcc.target/avr/torture/pr109907-1.c: Same.
* gcc.target/avr/torture/pr109907-2.c: Same.
* gcc.target/avr/torture/pr114132-2.c: Same.
* gcc.target/avr/torture/pr39633.c: Same.
* gcc.target/avr/torture/pr51782-1.c: Same.
* gcc.target/avr/torture/pr61055.c: Same.
* gcc.target/avr/torture/pr61443.c: Same.
* gcc.target/avr/torture/pr64331.c: Same.
* gcc.target/avr/torture/pr77326.c: Same.
* gcc.target/avr/torture/pr83729.c: Same.
* gcc.target/avr/torture/pr83801.c: Same.
* gcc.target/avr/torture/pr87376.c: Same.
* gcc.target/avr/torture/pr88236-pr115726.c: Same.
* gcc.target/avr/torture/pr92606.c: Same.
* gcc.target/avr/torture/pr98762.c: Same.
* gcc.target/avr/torture/sat-hr-plus-minus.c: Same.
* gcc.target/avr/torture/sat-k-plus-minus.c: Same.
* gcc.target/avr/torture/sat-llk-plus-minus.c: Same.
* gcc.target/avr/torture/sat-r-plus-minus.c: Same.
* gcc.target/avr/torture/sat-uhr-plus-minus.c: Same.
* gcc.target/avr/torture/sat-uk-plus-minus.c: Same.
* gcc.target/avr/torture/sat-ullk-plus-minus.c: Same.
* gcc.target/avr/torture/sat-ur-plus-minus.c: Same.
* gcc.target/avr/torture/set-mem.c: Same.
* gcc.target/avr/torture/tiny-progmem.c: Same.

Fortran: Simplify len_trim with array ref and fix mapping bug[PR84868].

2024-07-16 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/84868
* simplify.cc (gfc_simplify_len_trim): If the argument is an
element of a parameter array, simplify all the elements and
build a new parameter array to hold the result, after checking
that it doesn't already exist.
* trans-expr.cc (gfc_get_interface_mapping_array) if a string
length is available, use it for the typespec.
(gfc_add_interface_mapping): Supply the se string length.

gcc/testsuite/
PR fortran/84868
* gfortran.dg/pr84868.f90: New test.

rtl-ssa: Fix removal of order_nodes [PR115929]

order_nodes are used to implement ordered comparisons between
two insns with the same program point number. remove_insn would
remove an order_node from its splay tree, but didn't remove it
from the insn. This caused confusion if the insn was later
reinserted somewhere else that also needed an order_node.

gcc/
PR rtl-optimization/115929
* rtl-ssa/insns.cc (function_info::remove_insn): Remove an
order_node from the instruction as well as from the splay tree.

gcc/testsuite/
PR rtl-optimization/115929
* gcc.dg/torture/pr115929-1.c: New test.

recog: restrict paradoxical mode punning in insn_propagation [PR115901]

In g:44fc801e97a8dc626a4806ff4124439003420b20 I'd extended
insn_propagation to handle simple cases of hard-reg mode punning.
One of the checks was that the new use mode occupied the same
number of registers as the original definition mode. However,
as PR115901 shows, we need to avoid increasing the size of any
registers in the punned "to" expression as well.

Specifically, the test includes a DImode move from GPR x0 to
a vector register, followed by a V2DI use of the vector register.
The simplification would then create a V2DI spanning x0 and x1,
manufacturing a new, unwanted use of x1.

Checking for that kind of thing directly seems too cumbersome,
and is not related to the original motivation (which was to improve
handling of shared vector zeros on aarch64). This patch therefore
restricts the paradoxical case to constants.

gcc/
PR rtl-optimization/115901
* recog.cc (insn_propagation::apply_to_rvalue_1): Restrict
paradoxical mode punning to cases where "to" is constant.

gcc/testsuite/
PR rtl-optimization/115901
* gcc.dg/torture/pr115901.c: New test.

rtl-ssa: Enforce earlyclobbers on hard-coded clobbers [PR115891]

The asm in the testcase has a memory operand and also clobbers ax.
The clobber means that ax cannot be used to hold inputs, which
extends to the address of the memory.

I think I had an implicit assumption that constrain_operands
would enforce this, but in hindsight, that clearly wasn't going
to be true. constrain_operands only looks at constraints, and
these clobbers are by definition outside the constraint system.
(And that's why they have to be handled conservatively, since there's
no way to distinguish the earlyclobber and non-earlyclobber cases.)

The semantics of hard-coded clobbers are generic enough that I think
they should be handled directly by rtl-ssa, rather than by consumers.
And in the context of rtl-ssa, the easiest way to check for a clash is
to walk the list of input registers, which we already have to hand.
It therefore seemed better not to push this down to a more generic
rtl helper.

The patch detects hard-coded clobbers in the same way as regrename:
by temporarily stubbing out the operands with pc_rtx.

gcc/
PR rtl-optimization/115891
* rtl-ssa/changes.cc (find_clobbered_access): New function.
(recog_level2): Use it to check for overlap between input
registers and hard-coded clobbers. Conditionally reset
recog_data.insn after changing the insn code.

gcc/testsuite/
PR rtl-optimization/115891
* gcc.target/i386/pr115891.c: New test.

AVR: Overhaul add and sub insns that extend one operand.

These are insns of the forms

  (set (regA:M)
       (plus:M (extend:M (regB:L))
               (regA:M)))
and

  (set (regA:M)
       (minus:M (regA:M)
                (extend:M (regB:L))))

where "extend" may be a sign-extend or zero-extend,
and the integer modes are  SImode >= M > L >= QImode.

The existing patterns are now represented in terms of insns
with mode iterators and a code iterator over any_extend,
and these new insn support all valid combinations of M and L
(which previously was not the case).

gcc/
* config/avr/avr.cc (avr_out_minus): Assimilate into...
(avr_out_plus_ext): ...this new function.
(avr_adjust_insn_length) [ADJUST_LEN_PLUS_EXT]: Handle case.
(avr_rtx_costs_1) [PLUS, MINUS]: Adjust RTX costs.
* config/avr/avr.md (adjust_len) <plus_ext>: Add new attribute value.
(*addpsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_zero_extend.qi_split): Assimilate...
(*addsi3_zero_extend_split): Assimilate...
(*addsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_sign_extend.hi_split): Assimilate...
(*addhi3.sign_extend1_split): Assimilate...
(*add<PSISI:mode>3.<code>.<QIPSI:mode>_split): ...into this
new insn-and-split.
(*addpsi3_zero_extend.hi): Assimilate...
(*addpsi3_zero_extend.qi): Assimilate...
(*addsi3_zero_extend): Assimilate...
(*addsi3_zero_extend.hi): Assimilate...
(*addpsi3_sign_extend.hi): Assimilate...
(*addhi3.sign_extend1): Assimilate...
(*add<PSISI:mode>3.<code>.<QIPSI:mode>): ...into this new insn.
(*subpsi3_sign_extend.hi_split): Assimilate...
(*subhi3.sign_extend2_split): Assimilate...
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>_split): Assimilate...
(*sub<HISI:mode>3.<code><QIPSI:mode>_split): ...into this new
insn-and-split.
(*subpsi3_sign_extend.hi): Assimilate...
(*subhi3.sign_extend2): Assimilate...
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>): Assimilate...
(*sub<HISI:mode>3.<code>.<QIPSI:mode>): ...into this new insn.
(*sub<HISI:mode>3.zero_extend.<QIPSI:mode>): Use avr_out_plus_ext
for asm out.
* config/avr/avr-protos.h (avr_out_minus): Remove.
(avr_out_plus_ext): New proto.
gcc/testsuite/
* gcc.target/avr/torture/add-extend.c: New test.
* gcc.target/avr/torture/sub-extend.c: New test.

PR modula2/115957 ICE on procedure local const declaration

An ICE would occur if a constant was declared using a variable term.
This fix catches variable terms in constant expressions and generates
an unrecoverable error.

gcc/m2/ChangeLog:

PR modula2/115957
* gm2-compiler/M2StackAddress.mod (PopAddress): Detect tail=NIL
and generate an internal error.
* gm2-compiler/PCBuild.bnf (InConstParameter): New variable.
(InConstBlock): New variable.
(ErrorString): Rewrite using MetaErrorStringT0.
(ErrorArrayAt): Rewrite using MetaErrorStringT0.
(WarnMissingToken): Use MetaErrorStringT0.
(CompilationUnit): Set seenError FALSE.
(init): Initialize InConstParameter and InConstBlock.
(ConstantDeclaration): Set InConstBlock.
(ConstSetOrQualidentOrFunction): Call CheckNotVar if not
InConstParameter and InConstBlock.
(ConstActualParameters): Set InConstParameter TRUE and restore
value at the end.
* gm2-compiler/PCSymBuild.def (CheckNotVar): New procedure.
Remove all unnecessary export qualified list.
* gm2-compiler/PCSymBuild.mod (CheckNotVar): New procedure.

gcc/testsuite/ChangeLog:

PR modula2/115957
* gm2/errors/fail/badconst.mod: New test.
* gm2/pim/fail/tinyadr.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

Lower zeroing array assignment to memset for allocatable arrays.

gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.

gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.

Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>

tree-optimization/115841 - reduction epilogue placement issue

When emitting the compensation to the vectorized main loop for
a vector reduction value to be re-used in the vectorized epilogue
we fail to place it in the correct block when the main loop is
known to be entered (no loop_vinfo->main_loop_edge) but the
epilogue is not (a loop_vinfo->skip_this_loop_edge). The code
currently disregards this situation.

With the recent znver4 cost fix I couldn't trigger this situation
with the testcase but I adjusted it so it could eventually trigger
on other targets.

PR tree-optimization/115841
* tree-vect-loop.cc (vect_transform_cycle_phi): Correctly
place the partial vector reduction for the accumulator
re-use when the main loop cannot be skipped but the
epilogue can.

* gcc.dg/vect/pr115841.c: New testcase.

AVR: Allow more combinations of XOR / IOR with byte-shifts.

This patch takes some existing patterns that have QImode as one
input and uses a mode iterator to allow for more modes to match.
These insns are split after reload into *xorqi3 resp. *iorqi3 insn(s).

gcc/
* config/avr/avr-protos.h (avr_emit_xior_with_shift): New proto.
* config/avr/avr.cc (avr_emit_xior_with_shift): New function.
* config/avr/avr.md (any_lshift): New code iterator.
(*<xior:code><mode>.<any_lshift:code>): New insn-and-split.
(<code><HISI:mode><QIPSI:mode>.0): Replaces...
(*<code_stdname><mode>qi.byte0): ...this one.
(*<xior:code><HISI:mode><QIPSI:mode>.<any_lshift:code>): Replaces...
(*<code_stdname><mode>qi.byte1-3): ...this one.

libiberty/buildargv: handle input consisting of only white space

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.

I have recently been working to improve this area of GDB, and noticed
some unexpected behaviour to the libiberty function buildargv, when
the input is a string consisting only of white space.

What I observe is that if the input to buildargv is a string
containing only white space, then buildargv will return an argv list
containing a single empty argument, e.g.:

  char **argv = buildargv (" ");
  assert (*argv[0] == '\0');
  assert (argv[1] == NULL);

We get the same output from buildargv if the input is a single space,
or multiple spaces.  Other white space characters give the same
results.

This doesn't seem right to me, and in fact, there appears to be a work
around for this issue in expandargv where we have this code:

  /* If the file is empty or contains only whitespace, buildargv would
     return a single empty argument.  In this context we want no arguments,
     instead.  */
  if (only_whitespace (buffer))
    {
      file_argv = (char **) xmalloc (sizeof (char *));
      file_argv[0] = NULL;
    }
  else
    /* Parse the string.  */
    file_argv = buildargv (buffer);

I think that the correct behaviour in this situation is to return an
empty argv array, e.g.:

  char **argv = buildargv (" ");
  assert (argv[0] == NULL);

And it turns out that this is a trivial change to buildargv.  The diff
does look big, but this is because I've re-indented a block.  Check
with 'git diff -b' to see the minimal changes.  I've also removed the
work around from expandargv.

When testing this sort of thing I normally write the tests first, and
then fix the code.  In this case test-expandargv.c has sort-of been
used as a mechanism for testing the buildargv function (expandargv
does call buildargv most of the time), however, for this particular
issue the work around in expandargv (mentioned above) masked the
buildargv bug.

I did consider adding a new test-buildargv.c file, however, this would
have basically been a copy & paste of test-expandargv.c (with some
minor changes to call buildargv).  This would be fine now, but feels
like we would eventually end up with one file not being updated as
much as the other, and so test coverage would suffer.

Instead, I have added some explicit buildargv testing to the
test-expandargv.c file, this reuses the test input that is already
defined for expandargv.

Of course, once I removed the work around from expandargv then we now
do always call buildargv from expandargv, and so the bug I'm fixing
would impact both expandargv and buildargv, so maybe the new testing
is redundant?  I tend to think more testing is always better, so I've
left it in for now.

2024-07-16  Andrew Burgess  <aburgess@redhat.com>

libiberty/

* argv.c (buildargv): Treat input of only whitespace as an empty
argument list.
(expandargv): Remove work around for intput that is only
whitespace.
* testsuite/test-expandargv.c: Add new tests 10, 11, and 12.
Extend testing to call buildargv in more cases.

libiberty/buildargv: POSIX behaviour for backslash handling

GDB makes use of the libiberty function buildargv for splitting the
inferior (program being debugged) argument string in the case where
the inferior is not being started under a shell.

I have recently been working to improve this area of GDB, and have
tracked done some of the unexpected behaviour to the libiberty
function buildargv, and how it handles backslash escapes.

For reference, I've been mostly reading:

  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html

The issues that I would like to fix are:

  1. Backslashes within single quotes should not be treated as an
  escape, thus: '\a' should split to \a, retaining the backslash.

  2. Backslashes within double quotes should only act as an escape if
  they are immediately before one of the characters $ (dollar),
  ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
  all other cases a backslash should not be treated as an escape
  character.  Thus: "\a" should split to \a, but "\$" should split to
  $.

  3. A backslash-newline sequence should be treated as a line
  continuation, both the backslash and the newline should be removed.

I've updated libiberty and also added some tests.  All the existing
libiberty tests continue to pass, but I'm not sure if there is more
testing that should be done, buildargv is used within lto-wraper.cc,
so maybe there's some testing folk can suggest that I run?

2024-07-16  Andrew Burgess  <aburgess@redhat.com>

libiberty/

* argv.c (buildargv): Backslashes within single quotes are
literal, backslashes only escape POSIX defined special characters
within double quotes, and backslashed newlines should act as line
continuations.
* testsuite/test-expandargv.c: Add new tests 7, 8, and 9.

i386, testsuite: Fix non-Unicode character

gcc/testsuite/ChangeLog:

* gcc.target/i386/indirect-thunk-extern-1.c: Replace character with
invalid encoding with `?`.

s390: Fix unresolved iterators bhfgq and xdee

Code attribute bhfgq is missing a mapping for TF. This results in
unresolved iterators in assembler templates for *bswaptf.

With the TF mapping added the base mnemonics vlbr and vstbr are not
"used" anymore but only the extended mnemonics (vlbr<bhfgq> was
interpreted as vlbr; likewise for vstbr). Therefore, remove the base
mnemonics from the scheduling description, otherwise, genattrtab would
error about unknown mnemonics.

Similarly, we end up with unresolved iterators in assembler templates
for mulfprx23 since code attribute xdee is missing a mapping for FPRX2.

gcc/ChangeLog:

* config/s390/3931.md (vlbr, vstbr): Remove.
* config/s390/s390.md (xdee): Add FPRX2 mapping.
* config/s390/vector.md (bhfgq): Add TF mapping.

Fixup unaligned load/store cost for znver5

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply copied from the bogus znver4 costs. The following makes
the unaligned costs equal to the aligned costs like in the fixed znver4
version.

* config/i386/x86-tune-costs.h (znver5_cost): Update unaligned
load and store cost from the aligned costs.

s390: Drop vcond{,u} expanders

Optabs vcond{,u} will be removed for GCC 15. Since regtest shows no
fallout, dropping the expanders, now.

gcc/ChangeLog:

PR target/114189
* config/s390/vector.md (V_HW2): Remove.
(vcond<V_HW:mode><V_HW2:mode>): Remove.
(vcondu<V_HW:mode><V_HW2:mode>): Remove.

s390: Enable vcond_mask for 128-bit ops

In preparation of dropping vcond{,u,eq} optabs
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654690.html
enable 128-bit operands for vcond_mask---including integer as well as
floating point.

This fixes partially PR115519 w.r.t. autovec-long-double-signaling-*.c
tests.

gcc/ChangeLog:

* config/s390/vector.md: Enable vcond_mask for 128-bit ops.

s390: Emulate vec_cmp{eq,gt,gtu} for 128-bit integers

Mode iterator V_HW enables V1TI for target VXE which means
vec_cmpv1tiv1ti becomes available which leads to an ICE since there is
no corresponding insn.

Fixed by emulating comparisons and enabling mode V1TI unconditionally
for V_HW. For the sake of symmetry, I also added TI mode to V_HW since
TF mode is already included. As a consequence the consumers of V_HW
vec_{splat,slb,sld,sldw,sldb,srdb,srab,srb,test_mask_int,test_mask}
also become available for 128-bit integers.

This fixes gcc.c-torture/execute/pr105613.c and gcc.dg/pr106063.c.

gcc/ChangeLog:

* config/s390/vector.md (V_HW): Enable V1TI unconditionally and
add TI.
(vec_cmpu<VIT_HW:mode><VIT_HW:mode>): Add 128-bit integer
variants.
(*vec_cmpeq<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgt<mode><mode>_nocc_emu): Emulate operation.
(*vec_cmpgtu<mode><mode>_nocc_emu): Emulate operation.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-cmp-emu-1.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-2.c: New test.
* gcc.target/s390/vector/vec-cmp-emu-3.c: New test.

tree-optimization/115843 - fix wrong-code with fully-masked loop and peeling

When AVX512 uses a fully masked loop and peeling we fail to create the
correct initial loop mask when the mask is composed of multiple
components in some cases. The following fixes this by properly applying
the bias for the component to the shift amount.

PR tree-optimization/115843
* tree-vect-loop-manip.cc
(vect_set_loop_condition_partial_vectors_avx512): Properly
bias the shift of the initial mask for alignment peeling.

* gcc.dg/vect/pr115843.c: New testcase.

Fixup unaligned load/store cost for znver4

Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue.  It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4.  The following makes
the unaligned costs equal to the aligned costs.

This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there.  But it makes it qualify
as a regression fix.

PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.

PR tree-optimization/114661: Generalize MULT_EXPR recognition in match.pd.

This patch resolves PR tree-optimization/114661, by generalizing the set
of expressions that we canonicalize to multiplication.  This extends the
optimization(s) contributed (by me) back in July 2021.
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575999.html

The existing transformation folds (X*C1)^(X<<C2) into X*C3 when
allowed.  A subtlety is that for non-wrapping integer types, we
actually fold this into (int)((unsigned)X*C3) so that we don't
introduce an undefined overflow that wasn't in the original.
Unfortunately, this transformation confuses itself, as the type-cast
multiplication isn't recognized when further combining bit operations.
Fixed here by allowing optional useless type conversions in transforms
to turn (int)((unsigned)X*C1)^(X<<C2) into (int)((unsigned)X*C3) so
that match.pd and EVRP can continue to construct multiplications.

For the example given in the PR:

unsigned mul(unsigned char c) {
    if (c > 3) __builtin_unreachable();
    return c << 18 | c << 15 |
           c << 12 | c << 9 |
           c << 6 | c << 3 | c;
}

GCC on x86_64 with -O2 previously generated:

mul: movzbl  %dil, %edi
        leal    (%rdi,%rdi,8), %edx
        leal    0(,%rdx,8), %eax
        movl    %edx, %ecx
        sall    $15, %edx
        orl     %edi, %eax
        sall    $9, %ecx
        orl     %ecx, %eax
        orl     %edx, %eax
        ret

with this patch we now generate:

mul: movzbl  %dil, %eax
        imull   $299593, %eax, %eax
        ret

2024-07-16  Roger Sayle  <roger@nextmovesoftware.com>
    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
PR tree-optimization/114661
* match.pd ((X*C1)|(X*C2) to X*(C1+C2)): Allow optional useless
type conversions around multiplications, such as those inserted
by this transformation.

gcc/testsuite/ChangeLog
PR tree-optimization/114661
* gcc.dg/pr114661.c: New test case.

i386: extend trunc{128}2{16,32,64}'s scope.

Based on actual usage, trunc{128}2{16,32,64} use some instructions from
sse/sse3, so extend their scope to extend the scope of optimization.

gcc/ChangeLog:

PR target/107432
* config/i386/sse.md
(PMOV_SRC_MODE_3_AVX2): Add TARGET_AVX2 for V4DI and V8SI.
(PMOV_SRC_MODE_4): Add TARGET_AVX2 for V4DI.
(trunc<mode><pmov_dst_3_lower>2): Change constraint from TARGET_AVX2 to
TARGET_SSSE3.
(trunc<mode><pmov_dst_4_lower>2): Ditto.
(truncv2div2si2): Change constraint from TARGET_AVX2 to TARGET_SSE.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-10.c: New test.

libbacktrace: support FDPIC

Based on patch by Max Filippov.

* internal.h: If FDPIC, #include <link.h> and/or <sys/link.h>.
(libbacktrace_using_fdpic): Define.
(struct libbacktrace_base_address): Define.
(libbacktrace_add_base): Define.
(backtrace_dwarf_add): Change base_address to struct
libbacktrace_base_address.
* dwarf.c (struct dwarf_data): Change base_address to struct
libbacktrace_base_address.
(add_ranges, find_address_ranges, build_ddress_map): Likewise.
(build_dwarf_data, build_dwarf_add): Likewise.
(add_low_high_range): Change base_address to struct
libbacktrace_base_address.  Use libbacktrace_add_base.
(add_ranges_from_ranges, add_ranges_from_rnglists): Likewise.
(add_line): Use libbacktrace_add_base.
* elf.c (elf_initialize_syminfo): Change base_address to struct
libbacktrace_base_address.  Use libbacktrace_add_base.
(elf_add): Change base_address to struct
libbacktrace_base_address.
(phdr_callback): Likewise.  Initialize base_address.m.
(backtrace_initialize): If using FDPIC, don't call elf_add with
main executable; always use dl_iterate_phdr.
* macho.c (macho_add_symtab): Change base_address to struct
libbacktrace_base_address.  Use libbacktrace_add_base.
(macho_syminfo): Change base_address to struct
libbacktrace_base_address.
(macho_add_fat, macho_add_dsym, macho_add): Likewise.
(backtrace_initialize): Likewise.  Initialize base_address.m.
* pecoff.c (coff_initialize_syminfo): Change base_address to
struct libbacktrace_base_address.  Use libbacktrace_add_base.
(coff_add): Change base_address to struct
libbacktrace_base_address.  Initialize base_address.m.

Daily bump.

Fix liveness computation for shift/rotate counts in ext-dce

So as I've noted before I believe the control flow in ext-dce.cc is horribly
messy.  While investigating a fix for 115877 I came across another problem
related to control flow handling.

Specifically, if we have an binary op which implies the 2nd operand is fully
live, then we'd actually fail to mark that operand as live.

We essentially broke out of the loop which was supposed to be safe.  But Y was
a REG and if Y is a REG or CONST_INT we skip sub-rtxs and thus failed to
process that operand (the shift count) at all.

Rather than muck around with control flow, we can just set all the bits as live
in DST_MASK and let normal processing continue.  With all the bits live IN
DST_MASK all the bits implied by the mode of the argument will also be live.

No testcase.

Bootstrapped and regression tested on x86.  Pushing to the trunk.

gcc/
* ext-dce.cc (ext_dce_process_uses): Simplify control flow and fix
liveness computation for shift/rotate counts.

Fix sign/carry bit handling in ext-dce.

My change to fix a ubsan issue broke handling propagation of the carry/sign bit
down through a right shift.  Thanks to Andreas for the analysis and proposed
fix and Sergei for the testcase.

        PR rtl-optimization/115876
        PR rtl-optimization/115916
gcc/
* ext-dce.cc (carry_backpropagate): Make return type unsigned as well.
Cast to signed for right shift to preserve sign bit.

gcc/testsuite/

* g++.dg/torture/pr115916.C: New test.

Co-author: Andreas Schwab <schwab@linux-m68k.org>
Co-author: Sergei Trofimovich <slyfox at gentoo dot org>

c++: alias template with dependent attributes [PR115897]

Here we're prematurely stripping the dependent alias template-id A<T> to
its defining-type-id T when used as a template argument, which in turn
causes us to essentially ignore A's vector_size attribute in the outer
template-id.

This has always been a problem for class template-ids it seems, and after
r14-2170 variable template-ids are affected as well.

This patch marks alias templates that have a dependent attribute as
complex (as with e.g. constrained alias templates) so that we don't look
through them prematurely.

PR c++/115897

gcc/cp/ChangeLog:

* pt.cc (complex_alias_template_p): Return true for an alias
template with attributes.
(get_underlying_template): Don't look through an alias template
with attributes.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-77.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>

Revert "RISC-V: Attribute parser: Use alloca() instead of new + std::unique_ptr"

This reverts commit 5040c273484d7123a40a99cdeb434cecbd17a2e9.

RISC-V: Allow adding enabled extension via target arch attributes

The set of enabled extensions can be extended via target arch function
attributes by listing each extension with a '+' prefix and a comma as
list separator.  E.g.:
  __attribute__((target("arch=+zba,+zbb"))) void foo();

The programmer intends to ensure that one or more extensions
are enabled when building the code.  This is independent of the arch
string that is passed at build time via the -march= option.

Therefore, it is reasonable to allow enabling extensions via target arch
attributes, which have already been enabled via the -march= string.

The subset list code already supports such duplication for implied
extensions.  This patch adds an interface so the subset list
parser can be switched into a mode where duplication is allowed.

This commit fixes the following regressed test cases:
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::add):
Allow adding enabled extension if m_allow_adding_dup is set.
* config/riscv/riscv-subset.h: Add m_allow_adding_dup and setter.
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Allow adding enabled extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115554.c: Change expected fail to expected pass.
* gcc.target/riscv/target-attr-16.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Rewrite target attribute handling

The target-arch attribute handling in RISC-V is only a few months old,
but already saw a rewrite (9941f0295a14), which addressed an important
issue.  This rewrite introduced a hash table in the backend, which is
used to keep track of target-arch attributes of all functions.
The index of this hash table is the pointer to the function declaration
object (fndecl).  However, objects like these don't have the lifetime
that is assumed here, which resulted in observing two fndecl objects
with the same address for different objects (triggering the assertion
in riscv_func_target_put() -- see also PR115562).

This patch removes the hash table approach in favor of storing target
specific options using the DECL_FUNCTION_SPECIFIC_TARGET() macro, which
is also used by other backends and is specifically designed for this
purpose (https://gcc.gnu.org/onlinedocs/gccint/Function-Properties.html).

To have an accessible field in the target options, we need to
adjust riscv.opt and introduce the field riscv_arch_string
(for the already existing option '-march=').

Using this macro allows to remove much code from riscv-common.cc, which
controls access to the objects 'func_target_table' and 'current_subset_list'.

One thing to mention is, that we had two subset lists:
current_subset_list and cmdline_subset_list, with the latter being
introduced recently for target attribute handling.
This patch reduces them back to one (cmdline_subset_list) which
contains the list of extensions that have been enabled by the command
line arguments.

Note that the patch keeps the existing behavior of rejecting
duplications of extensions when added via the '+' operator in a function
target attribute.  E.g. "-march=rv64gc_zbb" and "arch=+zbb" will trigger
an error (see pr115554.c).  However, at the same time this patch breaks
the acceptance of adding implied extensions, which causes the following
six regressions (with the error "extension 'EXT' appear more than one time"):
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c

New tests were added to document the behavior and to ensure it won't
regress.  This patch did not show any regressions for rv32/rv64
and fixes the ICEs from PR115554 and PR115562.

PR target/115554
PR target/115562

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct riscv_func_target_info):
Remove.
(struct riscv_func_target_hasher): Likewise.
(riscv_func_decl_hash): Likewise.
(riscv_func_target_hasher::hash): Likewise.
(riscv_func_target_hasher::equal): Likewise.
(riscv_current_subset_list): Likewise.
(riscv_cmdline_subset_list): Remove obsolete space.
(riscv_func_target_table_lazy_init): Remove.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
(riscv_arch_str): Generate from cmdline_subset_list.
(riscv_set_arch_by_subset_list): Don't set current_subset_list.
(riscv_parse_arch_string): Remove current_subset_list.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Get subset list via riscv_cmdline_subset_list().
* config/riscv/riscv-subset.h (riscv_current_subset_list):
Remove prototype.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Build base arch string from existing target options, if any.
(riscv_target_attr_parser::update_settings): Store new arch
string in target options.
(riscv_process_one_target_attr): Whitespace fix.
(riscv_process_target_attr): Drop opts argument.
(riscv_option_valid_attribute_p): Properly save, change and restore
target options.
* config/riscv/riscv.cc (get_arch_str): New function.
(riscv_declare_function_name): Get arch string for option-arch
directive from function's target options.
* config/riscv/riscv.opt: Add riscv_arch_string variable to
march option.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/target-attr-01.c: Add test for option-arch directive.
* gcc.target/riscv/target-attr-02.c: Likewise.
* gcc.target/riscv/target-attr-03.c: Likewise.
* gcc.target/riscv/target-attr-04.c: Likewise.
* gcc.target/riscv/target-attr-05.c: Fix formatting.
* gcc.target/riscv/target-attr-06.c: Likewise.
* gcc.target/riscv/target-attr-07.c: Likewise.
* gcc.target/riscv/pr115554.c: New test.
* gcc.target/riscv/pr115562.c: New test.
* gcc.target/riscv/target-attr-08.c: New test.
* gcc.target/riscv/target-attr-09.c: New test.
* gcc.target/riscv/target-attr-10.c: New test.
* gcc.target/riscv/target-attr-11.c: New test.
* gcc.target/riscv/target-attr-12.c: New test.
* gcc.target/riscv/target-attr-13.c: New test.
* gcc.target/riscv/target-attr-14.c: New test.
* gcc.target/riscv/target-attr-15.c: New test.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

RISC-V: Attribute parser: Use alloca() instead of new + std::unique_ptr

Allocating an object on the heap with new, wrapping it in a
std::unique_ptr and finally getting the buffer via buf.get()
is a correct way to allocate a buffer that is automatically
freed on return. However, a simple invocation of alloca()
does the same with less overhead.

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::parse_arch):
Replace new + std::unique_ptr by alloca().
(riscv_process_one_target_attr): Likewise.
(riscv_process_target_attr): Likewise.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>

[i386] adjust flag_omit_frame_pointer in a single function [PR113719]

The first two patches for PR113719 have each regressed
gcc.dg/ipa/iinline-attr.c on a different target.  The reason for this
instability is that there are competing flag_omit_frame_pointer
overriders on x86:

- ix86_recompute_optlev_based_flags computes and sets a
  -f[no-]omit-frame-pointer default depending on
  USE_IX86_FRAME_POINTER and, in 32-bit mode, optimize_size

- ix86_option_override_internal enables flag_omit_frame_pointer for
  -momit-leaf-frame-pointer to take effect

ix86_option_override[_internal] calls
ix86_recompute_optlev_based_flags before setting
flag_omit_frame_pointer.  It is called during global process_options.

But ix86_recompute_optlev_based_flags is also called by
parse_optimize_options, during attribute processing, and at that
point, ix86_option_override is not called, so the final overrider for
global options is not applied to the optimize attributes.  If they
differ, the testcase fails.

In order to fix this, we need to process all overriders of this option
whenever we process any of them.  Since this setting is affected by
optimization options, it makes sense to compute it in
parse_optimize_options, rather than in process_options.

for  gcc/ChangeLog

PR target/113719
* config/i386/i386-options.cc (ix86_option_override_internal):
Move flag_omit_frame_pointer final overrider...
(ix86_recompute_optlev_based_flags): ... here.

RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

The following testcase was not properly testing anything due to an
uninitialized variable. As a result, the loop was not iterating through
the testing data, but instead on undefined values which could cause an
unexpected abort.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
initialize variable

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>

AVR: avr-md - Simplify GET_MODE and GET_MODE_BITSIZE.

gcc/
* config/avr/avr.md: Simplify mode usage.
(GET_MODE_SIZE (<MODE>mode)): Use <SIZE> instead.
(GET_MODE_BITSIZE (<MODE>mode) - 1): Use <MSB> instead.
(GET_MODE_MASK (QImode)): Use 0xff instead.
* config/avr/avr-fixed.md: Same.

varasm: Add support for emitting binary data with the new gas .base64 directive

Nick has implemented a new .base64 directive in gas (to be shipped in
the upcoming binutils 2.43; big thanks for that).
See https://sourceware.org/bugzilla/show_bug.cgi?id=31964

The following patch adjusts default_elf_asm_output_ascii (i.e.
ASM_OUTPUT_ASCII elfos.h implementation) to use it if it detects binary
data and gas supports it.

Without this patch, we emit stuff like:
        .string "\177ELF\002\001\001\003"
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string "\002"
        .string ">"
...
        .string "\324\001\236 0FS\202\002E\n0@\203\004\005&\202\021\337)\021\203C\020A\300\220I\004\t\b\206(\234\0132l\004b\300\bK\006\220$0\303\020P$\233\211\002D\f"
etc., with this patch more compact
        .base64 "f0VMRgIBAQMAAAAAAAAAAAIAPgABAAAAABf3AAAAAABAAAAAAAAAAACneB0AAAAAAAAAAEAAOAAOAEAALAArAAYAAAAEAAAAQAAAAAAAAABAAEAAAAAAAEAAQAAAAAAAEAMAAAAAAAAQAwAAAAAAAAgAAAAAAAAAAwAAAAQAAABQAwAAAAAAAFADQAAAAAAAUANAAAAAAAAcAAAAAAAAABwAAAAAAAAAAQAAAAAAAAABAAAABAAAAAAAAAAAAAAAAABAAAAAAAAAAEAAAAAAADBwOQAAAAAAMHA5AAAAAAAAEAAAAAAAAAEAAAAFAAAAAIA5AAAAAAAAgHkAAAAA"
        .base64 "AACAeQAAAAAAxSSgAgAAAADFJKACAAAAAAAQAAAAAAAAAQAAAAQAAAAAsNkCAAAAAACwGQMAAAAAALAZAwAAAADMtc0AAAAAAMy1zQAAAAAAABAAAAAAAAABAAAABgAAAGhmpwMAAAAAaHbnAwAAAABoducDAAAAAOAMAQAAAAAA4MEeAAAAAAAAEAAAAAAAAAIAAAAGAAAAkH2nAwAAAACQjecDAAAAAJCN5wMAAAAAQAIAAAAAAABAAgAAAAAAAAgAAAAAAAAABAAAAAQAAABwAwAAAAAAAHADQAAAAAAAcANAAAAAAABAAAAAAAAAAEAAAAAAAAAACAAAAAAA"
        .base64 "AAAEAAAABAAAALADAAAAAAAAsANAAAAAAACwA0AAAAAAACAAAAAAAAAAIAAAAAAAAAAEAAAAAAAAAAcAAAAEAAAAaGanAwAAAABoducDAAAAAGh25wMAAAAAAAAAAAAAAAAQAAAAAAAAAAgAAAAAAAAAU+V0ZAQAAABwAwAAAAAAAHADQAAAAAAAcANAAAAAAABAAAAAAAAAAEAAAAAAAAAACAAAAAAAAABQ5XRkBAAAAAw/WAMAAAAADD+YAwAAAAAMP5gDAAAAAPy7CgAAAAAA/LsKAAAAAAAEAAAAAAAAAFHldGQGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
        .base64 "AAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAUuV0ZAQAAABoZqcDAAAAAGh25wMAAAAAaHbnAwAAAACYGQAAAAAAAJgZAAAAAAAAAQAAAAAAAAAvbGliNjQvbGQtbGludXgteDg2LTY0LnNvLjIAAAAAAAQAAAAwAAAABQAAAEdOVQACgADABAAAAAEAAAAAAAAAAQABwAQAAAAJAAAAAAAAAAIAAcAEAAAAAwAAAAAAAAAEAAAAEAAAAAEAAABHTlUAAAAAAAMAAAACAAAAAAAAAAOAAACsqAAAgS0AAOJWAAAjNwAAXjAAAAAAAAAAAAAAF1gAAHsxAABBBwAA"
        .base64 "G0kAALGmAACwoAAAAAAAAAAAAACQhAAAAAAAAOw1AACNYgAAAAAAAFQoAAAAAAAAx3UAALZAAAAAAAAAiIUAALGeAABBlAAAWEsAAPmRAACmOgAAAAAAADh3AAAAAAAAlCAAAAAAAABymgAAaosAAMIjAAAKMQAAMkIAADU0AAAAAAAA5ZwAAAAAAAAAAAAAAAAAAFIdAAAIGQAAAAAAAMFbAAAoTQAAGDcAAIRgAAA6HgAAlxwAAAAAAADOlgAAAAAAAEhPAAARiwAAMGgAAOVtAADMFgAAAAAAAAAAAACrjgAAYl4AACZVAAA/HgAAAAAAAAAAAABqPwAAAAAA"
The patch attempts to juggle between readability and compactness, so
if it detects some hunk of the initializer that would be shorter to be
emitted as .string/.ascii directive, it does so, but if it previously
used .base64 directive it switches mode only if there is a 16+ char
ASCII-ish string.

On my #embed testcase from yesterday
unsigned char a[] = {
#embed "cc1plus"
};
without this patch it emits 2.4GB of assembly, while with this
patch 649M.
Compile times (trunk, so yes,rtl,extra checking) are:
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

real    0m13.647s
user    0m7.157s
sys     0m2.597s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

real    0m28.649s
user    0m26.653s
sys     0m1.958s
without the patch and
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

real    0m4.283s
user    0m2.288s
sys     0m0.859s
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

real    0m6.888s
user    0m5.876s
sys     0m1.002s
with the patch, so that feels like significant improvement.
The resulting embed-11.o is identical between the two ways of expressing
the mostly binary data in the assembly.  But note that there are portions
like:
        .base64 "nAAAAAAAAAAvZRcAIgAOAFAzMwEAAAAABgAAAAAAAACEQBgAEgAOAFBHcAIAAAAA7AAAAAAAAAAAX19nbXB6X2dldF9zaQBtcGZyX3NldF9zaV8yZXhwAG1wZnJfY29zaABtcGZyX3RhbmgAbXBmcl9zZXRfbmFuAG1wZnJfc3ViAG1wZnJfdGFuAG1wZnJfc3RydG9mcgBfX2dtcHpfc3ViX3VpAF9fZ21wX2dldF9tZW1vcnlfZnVuY3Rpb25zAF9fZ21wel9zZXRfdWkAbXBmcl9wb3cAX19nbXB6X3N1YgBfX2dtcHpfZml0c19zbG9uZ19wAG1wZnJfYXRh"
        .base64 "bjIAX19nbXB6X2RpdmV4YWN0AG1wZnJfc2V0X2VtaW4AX19nbXB6X3NldABfX2dtcHpfbXVsAG1wZnJfY2xlYXIAbXBmcl9sb2cAbXBmcl9hdGFuaABfX2dtcHpfc3dhcABtcGZyX2FzaW5oAG1wZnJfYXNpbgBtcGZyX2NsZWFycwBfX2dtcHpfbXVsXzJleHAAX19nbXB6X2FkZG11bABtcGZyX3NpbmgAX19nbXB6X2FkZF91aQBfX2dtcHFfY2xlYXIAX19nbW9uX3N0YXJ0X18AbXBmcl9hY29zAG1wZnJfc2V0X2VtYXgAbXBmcl9jb3MAbXBmcl9zaW4A"
        .string "__gmpz_ui_pow_ui"
        .string "mpfr_get_str"
        .string "mpfr_acosh"
        .string "mpfr_sub_ui"
        .string "__gmpq_set_ui"
        .string "mpfr_set_inf"
...
        .string "GLIBC_2.14"
        .string "GLIBC_2.11"
        .base64 "AAABAAIAAQADAAMAAwADAAMAAwAEAAUABgADAAEAAQADAAMABwABAAEAAwADAAMAAwAIAAEAAwADAAEAAwABAAMAAwABAAMAAQADAAMAAwADAAMAAwADAAYAAwADAAEAAQAIAAMAAwADAAMAAwABAAMAAQADAAMAAQABAAEAAwAIAAEAAwADAAEAAwABAAMAAQADAAEABgADAAMAAQAHAAMAAwADAAMAAwABAAMAAQABAAMAAwADAAkAAQABAAEAAwAKAAEAAwADAAMAAQABAAMAAwALAAEAAwADAAEAAQADAAMAAwABAAMAAwABAAEAAwADAAMABwABAAMAAwAB"
        .base64 "AAEAAwADAAEAAwABAAMAAQADAAMAAwADAAEAAQABAAEAAwADAAMAAQABAAEAAQABAAEAAQADAAMAAwADAAMAAQABAAwAAwADAA0AAwADAAMAAwADAAEAAQADAAMAAQABAAMAAwADAAEAAwADAAEAAwAIAAMAAwADAAMABgABAA4ACwAGAAEAAQADAAEAAQADAAEAAwABAAMAAwABAAEAAwABAAMAAwABAAEAAwADAAMAAwABAAMAAQABAAEAAQABAAMADwABAAMAAQADAAMAAwABAAEAAQAIAAEADAADAAMAAQABAAMAAwADAAEAAQABAAEAAQADAAEAAwADAAEA"
        .base64 "AwABAAMAAQADAAMAAQABAAEAAwADAAMAAwADAAMAAQADAAMACAAQAA8AAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQABAAEAAQA="
so it isn't all just totally unreadable stuff.

2024-07-15  Jakub Jelinek  <jakub@redhat.com>

* configure.ac (HAVE_GAS_BASE64): New check.
* config/elfos.h (BASE64_ASM_OP): Define if HAVE_GAS_BASE64 is
defined.
* varasm.cc (assemble_string): Bump maximum from 2000 to 16384 if
BASE64_ASM_OP is defined.
(default_elf_asm_output_limited_string): Emit opening '"' together
with STRING_ASM_OP.
(default_elf_asm_output_ascii): Use BASE64_ASM_OP if defined and
beneficial.  Remove UB when last_null is NULL.
* configure: Regenerate.
* config.in: Regenerate.

Fix SSA_NAME leak due to def_stmt is removed before use_stmt.

-  _5 = __atomic_fetch_or_8 (&set_work_pending_p, 1, 0);
-  # DEBUG old => (long int) _5
+  _6 = .ATOMIC_BIT_TEST_AND_SET (&set_work_pending_p, 0, 1, 0, __atomic_fetch_or_8);
+  # DEBUG old => NULL
   # DEBUG BEGIN_STMT
-  # DEBUG D#2 => _5 & 1
+  # DEBUG D#2 => NULL
...
-  _10 = ~_5;
-  _8 = (_Bool) _10;
-  # DEBUG ret => _8
+  _8 = _6 == 0;
+  # DEBUG ret => (_Bool) _10

confirmed.  convert_atomic_bit_not does this, it checks for single_use
and removes the def, failing to release the name (which would fix this up
IIRC).

Note the function removes stmts in "wrong" order (before uses of LHS
are removed), so it requires larger surgery.  And it leaks SSA names.

gcc/ChangeLog:

PR target/115872
* tree-ssa-ccp.cc (convert_atomic_bit_not): Remove use_stmt after use_nop_stmt is removed.
(optimize_atomic_bit_test_and): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115872.c: New test.

[APX NF] Add a pass to convert legacy insn to NF insns

For APX ccmp, current infrastructure will always generate cstore for
the ccmp flag user, like

cmpe    %rcx, %r8
ccmpnel %rax, %rbx
seta %dil
add %rcx, %r9
add     %r9, %rdx
testb   %dil, %dil
je .L2

For such case, the legacy add clobbers FLAGS_REG so there should have
extra cstore to avoid the flag be reset before using it. If the
instructions between flag producer and user are NF insns, the setcc/
test sequence is not required.

Add a pass to convert legacy flag clobber insns to their NF counterpart.
The convertion only happens when
1. APX_NF enabled.
2. For a BB, cstore was find, and there are insns between such cstore
and next explicit set insn to FLAGS_REG (test or cmp).
3. All the insns found should have NF counterpart.

The pass was added after rtl-ifcvt which eliminates some branch when
profitable, which could cause some flag-clobbering insn put between
cstore and jcc.

gcc/ChangeLog:

* config/i386/i386.md (has_nf): New define_attr, add to all
nf related patterns.
* config/i386/i386-features.cc (apx_nf_convert): New function
to convert Non-NF insns to their NF counterparts.
(class pass_apx_nf_convert): New pass class.
(make_pass_apx_nf_convert): New.
* config/i386/i386-passes.def: Add pass_apx_nf_convert after
rtl_ifcvt.
* config/i386/i386-protos.h (make_pass_apx_nf_convert): Declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-nf-2.c: New test.

arm: Fix the expected output of the test pr111235.c [PR115894]

With r15-1619-g3b9b8d6cfdf593, pr111235.c fails due to different
registers used in ldrexd instruction. The key part of this test is that
the compiler generates LDREXD. The registers used for that are pretty
much irrelevant as they are not matched with any other operations within
the test. This patch changes the test to test only for the mnemonic and
not for any of the operands.

2024-07-15 Surya Kumari Jangala <jskumari@linux.ibm.com>

gcc/testsuite:
PR testsuite/115894
* gcc.target/arm/pr111235.c: Update expected output.

RISC-V: Implement locality for __builtin_prefetch

The patch add the Zihintntl instructions in the prefetch pattern.
Zicbop has prefetch instructions. Zihintntl has NTL instructions.
Insert NTL instructions before prefetch instruction, if target
has Zihintntl extension.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter
to print zihintntl instructions string.
* config/riscv/riscv.md (prefetch): Add zihintntl instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/prefetch-zicbop.c: New test.
* gcc.target/riscv/prefetch-zihintntl.c: New test.

aarch64: Fix the expected output of the test cpy_1.c [PR115892]

The fix at r15-1619-g3b9b8d6cfdf593 results in a rearrangement of
instructions generated for cpy_1.c. This patch fixes the expected output.

2024-07-12 Surya Kumari Jangala <jskumari@linux.ibm.com>

gcc/testsuite:
PR testsuite/115892
* gcc.target/aarch64/sve/acle/general/cpy_1.c: Update expected
output.

CRIS: Adjust gcc.dg/tree-ssa/loop-1.c

With r15-1619-g3b9b8d6cfdf593, there's a XPASS and a FAIL
for this test-case for cris-elf. Looking at the generated
code, _foo is indeed no longer saved in a register for CRIS.
While that looks like a regression, coremark results are the
same around this revision, so simply adjust the test-case:
remove the target-specific exceptions for cris-*-*.

* gcc.dg/tree-ssa/loop-1.c: Remove target-specific test
and xfail to adjust for recent changes in register allocation.

RISC-V: Add md files for vector BFloat16

V3: Add Bfloat16 vector insn in generic-vector-ooo.md
v2: Rebase
Accroding to the BFloat16 spec, some vector iterators and new pattern
are added in md files.

Signed-off-by: Feng Wang <wangfeng@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/generic-vector-ooo.md: Add def_insn_reservation for vector BFloat16.
* config/riscv/riscv.md: Add new insn name for vector BFloat16.
* config/riscv/vector-iterators.md: Add some iterators for vector BFloat16.
* config/riscv/vector.md: Add some attribute for vector BFloat16.
* config/riscv/vector-bfloat16.md: New file. Add insn pattern vector BFloat16.

RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

v3: Modify warning message in riscv.cc
v2: Rebase
Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
functions are added by this patch.

Signed-off-by: Feng Wang <wangfeng@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
Add 'Zvfbfmin' intrinsic in bases.
(class vfwcvtbf16_f): Ditto.
(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
(vfncvtbf16_f): Ditto.
(vfncvtbf16_f_frm): Ditto.
(vfwcvtbf16_f): Ditto.
(vfwmaccbf16): Ditto.
(vfwmaccbf16_frm): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
Add vector intrinsic build judgment for BFloat16.
(build_all): Ditto.
(BASE_NAME_MAX_LEN): Adjust max length.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
Add new operand type for BFloat16.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
(validate_instance_type_required_extensions):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins.h (enum required_ext):
Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
(reqired_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Add match case for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv.cc (riscv_validate_vector_type):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.

AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

According to the instruction spec of AVX512BF16, the convert from float
to BF16 is not a simple truncation. It has special handling for
denormal/nan, even for normal float it will add an extra bias according
to the least significant bit for bf number. This means we cannot use the
vcvtne2ps2bf16 for any bf16 vector shuffle.
The optimization introduced in r15-1368 adds a specific split to convert
HImode permutation with this instruction, so remove it and treat the
BFmode permutation same as HFmode.

gcc/ChangeLog:

PR target/115889
* config/i386/predicates.md (vcvtne2ps2bf_parallel): Remove.
* config/i386/sse.md (hi_cvt_bf): Remove.
(HI_CVT_BF): Likewise.
(vpermt2_sepcial_bf16_shuffle_<mode>):Likewise.

gcc/testsuite/ChangeLog:

PR target/115889
* gcc.target/i386/vpermt2-special-bf16-shufflue.c: Adjust output
scan.

RISC-V: Add vector type of BFloat16 format

v3: Rebase
v2: Rebase
The vector type of BFloat16 format is added in this patch,
subsequent extensions to zvfbfmin and zvfwma need to be based
on this patch.

Signed-off-by: Feng Wang <wangfeng@eswincomputing.com>
gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (bfloat16_type):
Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_INDEX.
(bfloat16_wide_type): Ditto.
(same_ratio_eew_bf16_type): Ditto.
(main): Ditto.
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
Add vector type for BFloat16.
(RVV_WHOLE_MODES): Add vector type for BFloat16.
(RVV_FRACT_MODE): Ditto.
(RVV_NF4_MODES): Ditto.
(RVV_NF8_MODES): Ditto.
(RVV_NF2_MODES): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vbfloat16mf4_t):
Add builtin vector type for BFloat16.
(vbfloat16mf2_t): Add builtin vector type for BFloat16.
(vbfloat16m1_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m8_t): Ditto.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add required_ext checking for BFloat16.
* config/riscv/riscv-vector-builtins.def (vbfloat16mf4_t):
Add vector_type for BFloat16 in builtins.def.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
(vbfloat16m8_t): Ditto.
(double_trunc_bfloat_scalar): Add scalar_type def for BFloat16.
(double_trunc_bfloat_vector): Add vector_type def for BFloat16.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_BF_16):
Add required defination of BFloat16 ext.
* config/riscv/riscv-vector-switch.def (ENTRY):
Add vector_type information for BFloat16.
(TUPLE_ENTRY): Add tuple vector_type information for BFloat16.

Daily bump.

i386: Tweak i386-expand.cc to restore bootstrap on RHEL.

This is a minor change to restore bootstrap on systems using gcc 4.8
as a host compiler.  The fatal error is:

In file included from gcc/gcc/coretypes.h:471:0,
                 from gcc/gcc/config/i386/i386-expand.cc:23:
gcc/gcc/config/i386/i386-expand.cc: In function 'void ix86_expand_fp_absneg_operator(rtx_code, machine_mode, rtx_def**)':
./insn-modes.h:315:75: error: temporary of non-literal type 'scalar_float_mode' in a constant expression
#define HFmode (scalar_float_mode ((scalar_float_mode::from_int) E_HFmode))
                                                                           ^
gcc/gcc/config/i386/i386-expand.cc:2179:8: note: in expansion of macro 'HFmode'
   case HFmode:
        ^

The solution is to use the E_?Fmode enumeration constants as case values
in switch statements.

2024-07-14  Roger Sayle  <roger@nextmovesoftware.com>

* config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator):
Use E_?Fmode enumeration constants in switch statement.
(ix86_expand_copysign): Likewise.
(ix86_expand_xorsign): Likewise.

c, objc: Add -Wunterminated-string-initialization

Warn about the following:

    char  s[3] = "foo";

Initializing a char array with a string literal of the same length as
the size of the array is usually a mistake.  Rarely is the case where
one wants to create a non-terminated character sequence from a string
literal.

In some cases, for writing faster code, one may want to use arrays
instead of pointers, since that removes the need for storing an array of
pointers apart from the strings themselves.

    char  *log_levels[]   = { "info", "warning", "err" };
vs.
    char  log_levels[][7] = { "info", "warning", "err" };

This forces the programmer to specify a size, which might change if a
new entry is later added.  Having no way to enforce null termination is
very dangerous, however, so it is useful to have a warning for this, so
that the compiler can make sure that the programmer didn't make any
mistakes.  This warning catches the bug above, so that the programmer
will be able to fix it and write:

    char  log_levels[][8] = { "info", "warning", "err" };

This warning already existed as part of -Wc++-compat, but this patch
allows enabling it separately.  It is also included in -Wextra, since
it may not always be desired (when unterminated character sequences are
wanted), but it's likely to be desired in most cases.

Since Wc++-compat now includes this warning, the test has to be modified
to expect the text of the new warning too, in <gcc.dg/Wcxx-compat-14.c>.

Link: https://lists.gnu.org/archive/html/groff/2022-11/msg00059.html
Link: https://lists.gnu.org/archive/html/groff/2022-11/msg00063.html
Link: https://inbox.sourceware.org/gcc/36da94eb-1cac-5ae8-7fea-ec66160cf413@gmail.com/T/
PR c/115185

gcc/c-family/ChangeLog:

* c.opt: Add -Wunterminated-string-initialization.

gcc/c/ChangeLog:

* c-typeck.cc (digest_init): Separate warnings about character
arrays being initialized as unterminated character sequences
with string literals, from -Wc++-compat, into a new warning,
-Wunterminated-string-initialization.

gcc/ChangeLog:

* doc/invoke.texi: Document the new
-Wunterminated-string-initialization.

gcc/testsuite/ChangeLog:

* gcc.dg/Wcxx-compat-14.c: Adapt the test to match the new text
of the warning, which doesn't say anything about C++ anymore.
* gcc.dg/Wunterminated-string-initialization.c: New test.

Acked-by: Doug McIlroy <douglas.mcilroy@dartmouth.edu>
Acked-by: Mike Stump <mikestump@comcast.net>
Reviewed-by: Sandra Loosemore <sloosemore@baylibre.com>
Reviewed-by: Martin Uecker <uecker@tugraz.at>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Reviewed-by: Marek Polacek <polacek@redhat.com>

CRIS: Fix up last comment.

* config/cris/cris.cc (cris_option_override_after_change): Fix up
comment regarding disabling late_combine.

CRIS: Disable late-combine by default, related PR115883

With late-combine, performance for coremark compiled for cris-elf
regresses 2.6% by performance and by size 0.4%, measured at
r15-2005-g13757e50ff0b, when compiled with "-O2 -march=v10".

Earlier, at r15-1880-gce34fcc572a0, numbers were by performance 3.2%
and by size 0.4%, even with the proposed patch to PR115883 (TL;DR: a
presumed bug in LRA or combine exposed by late-combine).  Without that
patch, about the same performance results (at that revision).
Similarly around the late-combine commit (r15-1579-g792f97b44ffc5e).

I briefly looked at the performance regression for coremark at
r15-2005-g13757e50ff0b (with/without this patch) as far as seeing that
the stack-frame grew larger (maxing out on hard registers and needing
one more slot) for at least two of the top three* functions that
regressed the most in terms of cycles per call:
matrix_mul_matrix_bitextract (in coremark, 17% slower) and __subdf3
(in libgcc, 6.7% slower).  That makes sense when considering that
late-combine "naturally" stretches register life-times.  But, looking
at late_combine::combine_into_uses and late_combine::optimizable_set,
nothing stood out to me.  I guess there's improvement opportunities in
late_combine::check_register_pressure.

(*) I opted not to look at _dtoa_r (in newlib) mostly because it's
boring and always shows up when something in gcc goes sideways.  (It
maxes out on hard registers and is big.  End of story.)

Note that the change of default is done in the
TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE worker, not in the
TARGET_OPTION_OVERRIDE worker for reasons stated in the
comment.

* config/cris/cris.cc (cris_option_override_after_change): New
function.  Disable late-combine by default.
(cris_option_override): Call the new function.

Daily bump.

Document return value in write_cv_integer

gcc/
* dwarf2codeview.cc (write_lf_modifier): Expand upon comment.

Make sure CodeView symbols are aligned

CodeView symbols have to be multiples of four bytes; add an alignment
directive to write_data_symbol to ensure this.

Note that these can be zeroes, so we can rely on GAS to do this for us;
it's only types that need f3, f2, f1 values.

gcc/
* dwarf2codeview.cc (write_data_symbol): Add alignment directive.

Avoid magic numbers when writing CodeView padding

Adds names for the padding magic numbers to enum cv_leaf_type.

gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add padding constants.
(write_cv_padding): Use names for padding constants.

Add CodeView enum cv_sym_type

Make everything more gdb-friendly by using an enum for symbol constants
rather than #defines.

gcc/
* dwarf2codeview.cc (S_LDATA32, S_GDATA32, S_COMPILE3): Undefine.
(enum cv_sym_type): Define.
(struct codeview_symbol): Use enum cv_sym_type.
(write_codeview_symbols): Add default to switch.

Add CodeView enum cv_leaf_type

Make everything more gdb-friendly by using an enum for type constants
rather than #defines.

gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Define.
(struct codeview_subtype): Use enum cv_leaf_type.
(struct codeview_custom_type): Use enum cv_leaf_type.
(write_lf_fieldlist): Add default to switch.
(write_custom_types): Add default to switch.
* dwarf2codeview.h (LF_MODIFIER, LF_POINTER): Undefine.
(LF_PROCEDURE, LF_ARGLIST, LF_FIELDLIST, LF_BITFIELD): Likewise.
(LF_INDEX, LF_ENUMERATE, LF_ARRAY, LF_CLASS): Likewise.
(LF_STRUCTURE, LF_UNION, LF_ENUM, LF_MEMBER, LF_CHAR): Likewise.
(LF_SHORT, LF_USHORT, LF_LONG, LF_ULONG, LF_QUADWORD): Likewise.
(LF_UQUADWORD): Likewise.

fortran: Correctly evaluate scalar MASK arguments of MINLOC/MAXLOC

Add the preliminary code that the generated expression for MASK may depend
on when generating the inline code to evaluate MINLOC or MAXLOC with a
scalar MASK.

The generated code was only keeping the generated expression but not the
preliminary code, which was sufficient for simple cases such as data
references or simple (scalar) function calls, but was bogus with more
complicated ones.

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Add the
preliminary code generated for MASK to the preliminary code of
MINLOC/MAXLOC.

gcc/testsuite/ChangeLog:

* gfortran.dg/minmaxloc_17.f90: New test.

Add gcc.gnu.org account names to MAINTAINERS

As discussed in the thread starting at:

  https://gcc.gnu.org/pipermail/gcc/2024-June/244199.html

it would be useful to have the @gcc.gnu.org bugzilla account names
in MAINTAINERS.  This is because:

(a) Not every non-@gcc.gnu.org email listed in MAINTAINERS is registered
    as a bugzilla user.

(b) Only @gcc.gnu.org accounts tend to have full rights to modify tickets.

(c) A maintainer's name and email address aren't always enough to guess
    the bugzilla account name.

(d) The users list on bugzilla has many blank entries for "real name".

However, including @gcc.gnu.org to the account name might encourage
people to use it for ordinary email, rather than just for bugzilla.
This patch goes for the compromise of using the unqualified account
name, with some text near the top of the file to explain its usage.

There isn't room in the area maintainer sections for a new column,
so it seemed better to have the account name only in the Write
After Approval section.  It's then necessary to list all maintainers
there, even if they have more specific roles as well.

Also, there were some entries that didn't line up with the
prevailing columns (they had one tab too many or one tab too few).
It seemed easier to check for and report this, and other things,
if the file used spaces rather than tabs.

There was one instance of an email address without the trailing ">".
The updates to check-MAINTAINERS.py includes a test for that.

The account names in the file were taken from a trawl of the
gcc-cvs archives, with a very small number of manual edits for
ambiguities.  There are a handful of names that I couldn't find;
the new column has "-" for those.  The names were then filtered
against the bugzilla @gcc.gnu.org user list, with those not
present again being blanked out with "-".

ChangeLog:
* MAINTAINERS: Replace tabs with spaces.  Add a bugzilla account
name column to the Write After Approval section.  Line up the
email column and fix an entry that was missing the trailing ">".

contrib/ChangeLog:
* check-MAINTAINERS.py (sort_by_surname): Replace with...
(get_surname): ...this.
(has_tab, is_empty): Delete.
(check_group): Take a list of column positions as argument.
Check that lines conform to these column numbers.  Check that the
final column is an email in angle brackets.  Record surnames on
the fly.
(top level): Reject tabs.  Use paragraph counts to identify which
groups of lines should be checked.  Report missing sections.

diagnostics: add highlight-a vs highlight-b in colorization and pp_markup

Since r6-4582-g8a64515099e645 (which added class rich_location), ranges
of quoted source code have been colorized using the following rules:
- the primary range used the same color of the kind of the diagnostic
i.e. "error" vs "warning" etc (defaulting to bold red and bold magenta
respectively)
- secondary ranges alternate between "range1" and "range2" (defaulting
to green and blue respectively)

This works for cases with large numbers of highlighted ranges, but is
suboptimal for common cases.

The following patch adds a pair of color names: "highlight-a" and
"highlight-b", and uses them whenever it makes sense to highlight and
contrast two different things in the source code (e.g. a type mismatch).
These are used by diagnostic-show-locus.cc for highlighting quoted
source.  In addition the patch adds colorization to fragments within the
corresponding diagnostic messages themselves, using consistent
colorization between the message and the quoted source code for the two
different things being contrasted.

For example, consider:

demo.c: In function ‘test_bad_format_string_args’:
../../src/demo.c:25:18: warning: format ‘%i’ expects argument of
  type ‘int’, but argument 2 has type ‘const char *’ [-Wformat=]
   25 |   printf("hello %i", msg);
      |                 ~^   ~~~
      |                  |   |
      |                  int const char *
      |                 %s

Previously, the types within the message in quotes would be in bold but
not colorized, and the labelled ranges of quoted source code would use
bold magenta for the "int" and non-bold green for the "const char *".

With this patch:
- the "%i" and "int" in the message and the "int" in the quoted source
  are all colored bold green
- the "const char *" in the message and in the quoted source are both
  colored bold blue
so that the consistent use of contrasting color draws the reader's eyes
to the relationships between the diagnostic message and the source.

I've tried this with gnome-terminal with many themes, including a
variety of light versus dark backgrounds, solarized versus non-solarized
themes, etc, and it was readable in all.

My initial version of the patch used the existing %r and %R facilities
within pretty-print.cc for the messages, but this turned out to be very
uncomfortable, leading to error-prone format strings such as:

  error_at (richloc,
            "invalid operands to binary %s (have %<%r%T%R%> and %<%r%T%R%>)",
            opname,
            "highlight-a", type0,
            "highlight-b", type1);

To avoid requiring monstrosities such as the above, the patch adds a new
"%e" format code to pretty-print.cc, which expects a pp_element *, where
pp_element is a new abstract base class (actually a pp_markup::element),
along with various useful subclasses.  This lets the above be written
as:

  pp_markup::element_quoted_type element_0 (type0, highlight_colors::lhs);
  pp_markup::element_quoted_type element_1 (type1, highlight_colors::rhs);
  error_at (richloc,
            "invalid operands to binary %s (have %e and %e)",
            opname, &element_0, &element_1);

which I feel is maintainable and clear to translators; the use of %e and
pp_element * captures the type-unsafe part of the variadic call, and the
subclasses allow for type-safety (so e.g. an element_quoted_type expects
a type and a highlighting color).  This approach allows for some nice
simplifications within c-format.cc.

The patch also extends -Wformat to "teach" it about the new %e and
pp_element *.  Doing so requires c-format.cc to be able to determine
if a T * is a pp_element * (i.e. if T is a subclass).  To do so I added
a new comp_types callback for comparing types, where the C++ frontend
supplies a suitable implementation (and %e will always be wrong for C).

I've manually tested this on many diagnostics with both C and C++ and it
seems a subtle but significant improvement in readability.

I've added a new option -fno-diagnostics-show-highlight-colors in case
people prefer the old behavior.

gcc/c-family/ChangeLog:
* c-common.cc: Include "tree-pretty-print-markup.h".
(binary_op_error): Use pp_markup::element_quoted_type and %e.
(check_function_arguments): Add "comp_types" param and pass it to
check_function_format.
* c-common.h (check_function_arguments): Add "comp_types" param.
(check_function_format): Likewise.
* c-format.cc: Include "tree-pretty-print-markup.h".
(local_pp_element_ptr_node): New.
(PP_FORMAT_CHAR_TABLE): Add entry for %e.
(struct format_check_context): Add "m_comp_types" field.
(check_function_format): Add "comp_types" param and pass it to
check_format_info.
(check_format_info): Likewise, passing it to format_ctx's ctor.
(check_format_arg): Extract m_comp_types from format_ctx and
pass it to check_format_info_main.
(check_format_info_main): Add "comp_types" param and pass it to
arg_parser's ctor.
(class argument_parser): Add "m_comp_types" field.
(argument_parser::check_argument_type): Pass m_comp_types to
check_format_types.
(handle_subclass_of_pp_element_p): New.
(check_format_types): Add "comp_types" param, and use it to
call handle_subclass_of_pp_element_p.
(class element_format_substring): New.
(class element_expected_type_with_indirection): New.
(format_type_warning): Use element_expected_type_with_indirection
to unify the if (wanted_type_name) branches, reducing from four
emit_warning calls to two.  Simplify these further using %e.
Doing so also gives suitable colorization of the text within the
diagnostics.
(init_dynamic_diag_info): Initialize local_pp_element_ptr_node.
(selftest::test_type_mismatch_range_labels): Add nullptr for new
param of gcc_rich_location label overload.
* c-format.h (T_PP_ELEMENT_PTR): New.
* c-type-mismatch.cc: Include "diagnostic-highlight-colors.h".
(binary_op_rich_location::binary_op_rich_location): Use
highlight_colors::lhs and highlight_colors::rhs for the ranges.
* c-type-mismatch.h (class binary_op_rich_location): Add comment
about highlight_colors.

gcc/c/ChangeLog:
* c-objc-common.cc: Include "tree-pretty-print-markup.h".
(print_type): Add optional "highlight_color" param and use it
to show highlight colors in "aka" text.
(pp_markup::element_quoted_type::print_type): New.
* c-typeck.cc: Include "tree-pretty-print-markup.h".
(comp_parm_types): New.
(build_function_call_vec): Pass it to check_function_arguments.
(inform_for_arg): Use %e and highlight colors to contrast actual
versus expected.
(convert_for_assignment): Use highlight_colors::actual for the
rhs_label.
(build_binary_op): Use highlight_colors::lhs and highlight_colors::rhs
for the ranges.

gcc/ChangeLog:
* common.opt (fdiagnostics-show-highlight-colors): New option.
* common.opt.urls: Regenerate.
* coretypes.h (pp_markup::element): New forward decl.
(pp_element): New typedef.
* diagnostic-color.cc (gcc_color_defaults): Add "highlight-a"
and "highlight-b".
* diagnostic-format-json.cc (diagnostic_output_format_init_json):
Disable highlight colors.
* diagnostic-format-sarif.cc (diagnostic_output_format_init_sarif):
Likewise.
* diagnostic-highlight-colors.h: New file.
* diagnostic-path.cc (struct event_range): Pass nullptr for
highlight color of m_rich_loc.
* diagnostic-show-locus.cc (colorizer::set_range): Handle ranges
with m_highlight_color.
(colorizer::STATE_NAMED_COLOR): New.
(colorizer::m_richloc): New field.
(colorizer::colorizer): Add richloc param for initializing
m_richloc.
(colorizer::set_named_color): New.
(colorizer::begin_state): Add case STATE_NAMED_COLOR.
(layout::layout): Pass richloc to m_colorizer's ctor.
(selftest::test_one_liner_labels): Pass nullptr for new param of
gcc_rich_location ctor for labels.
(selftest::test_one_liner_labels_utf8): Likewise.
* diagnostic.h (diagnostic_context::set_show_highlight_colors):
New.
* doc/invoke.texi: Add option -fdiagnostics-show-highlight-colors
and highlight-a and highlight-b color caps.
* doc/ux.texi
(Use color consistently when highlighting mismatches): New
subsection.
* gcc-rich-location.cc (gcc_rich_location::add_expr): Add
"highlight_color" param.
(gcc_rich_location::maybe_add_expr): Likewise.
* gcc-rich-location.h (gcc_rich_location::gcc_rich_location):
Split out into a pair of ctors, where if a range_label is supplied
the caller must also supply a highlight color.
(gcc_rich_location::add_expr): Add "highlight_color" param.
(gcc_rich_location::maybe_add_expr): Likewise.
* gcc.cc (driver_handle_option): Handle
OPT_fdiagnostics_show_highlight_colors.
* lto-wrapper.cc (merge_and_complain): Likewise.
(append_compiler_options): Likewise.
(append_diag_options): Likewise.
(run_gcc): Likewise.
* opts-common.cc (decode_cmdline_options_to_array): Add comment
about -fno-diagnostics-show-highlight-colors.
* opts-global.cc (init_options_once): Preserve
pp_show_highlight_colors in case the global_dc's printer is
recreated.
* opts.cc (common_handle_option): Handle
OPT_fdiagnostics_show_highlight_colors.
(gen_command_line_string): Likewise.
* pretty-print-markup.h: New file.
* pretty-print.cc: Include "pretty-print-markup.h" and
"diagnostic-highlight-colors.h".
(pretty_printer::format): Handle %e.
(pretty_printer::pretty_printer): Handle new field
m_show_highlight_colors.
(pp_string_n): New.
(pp_markup::context::begin_quote): New.
(pp_markup::context::end_quote): New.
(pp_markup::context::begin_color): New.
(pp_markup::context::end_color): New.
(highlight_colors::expected): New.
(highlight_colors::actual): New.
(highlight_colors::lhs): New.
(highlight_colors::rhs): New.
(class selftest::test_element): New.
(selftest::test_pp_format): Add tests of %e.
(selftest::test_urlification): Likewise.
* pretty-print.h (pp_markup::context): New forward decl.
(class chunk_info): Add friend class pp_markup::context.
(class pretty_printer): Add friend pp_show_highlight_colors.
(pretty_printer::m_show_highlight_colors): New field.
(pp_show_highlight_colors): New inline function.
(pp_string_n): New decl.
* substring-locations.cc: Include "diagnostic-highlight-colors.h".
(format_string_diagnostic_t::highlight_color_format_string): New.
(format_string_diagnostic_t::highlight_color_param): New.
(format_string_diagnostic_t::emit_warning_n_va): Use highlight
colors.
* substring-locations.h
(format_string_diagnostic_t::highlight_color_format_string): New.
(format_string_diagnostic_t::highlight_color_param): New.
* toplev.cc (general_init): Initialize global_dc's
show_highlight_colors.
* tree-pretty-print-markup.h: New file.

gcc/cp/ChangeLog:
* call.cc: Include "tree-pretty-print-markup.h".
(implicit_conversion_error): Use highlight_colors::percent_h for
the labelled range.
(op_error_string): Split out into...
(concat_op_error_string): ...this.
(binop_error_string): New.
(op_error): Use %e, binop_error_string, highlight_colors::lhs,
and highlight_colors::rhs.
(maybe_inform_about_fndecl_for_bogus_argument_init): Add
"highlight_color" param; use it for the richloc.
(convert_like_internal): Use highlight_colors::percent_h for the
labelled_range, and highlight_colors::percent_i for the call to
maybe_inform_about_fndecl_for_bogus_argument_init.
(build_over_call): Pass cp_comp_parm_types for new "comp_types"
param of check_function_arguments.
(complain_about_bad_argument): Use highlight_colors::percent_h for
the labelled_range, and highlight_colors::percent_i for the call
to maybe_inform_about_fndecl_for_bogus_argument_init.
* cp-tree.h (maybe_inform_about_fndecl_for_bogus_argument_init):
Add optional highlight_color param.
(cp_comp_parm_types): New decl.
(highlight_colors::const percent_h): New decl.
(highlight_colors::const percent_i): New decl.
* error.cc: Include "tree-pretty-print-markup.h".
(highlight_colors::const percent_h): New defn.
(highlight_colors::const percent_i): New defn.
(type_to_string): Add param "highlight_color" and use it.
(print_nonequal_arg): Likewise.
(print_template_differences): Add params "highlight_color_a" and
"highlight_color_b".
(type_to_string_with_compare): Add params "this_highlight_color"
and "peer_highlight_color".
(print_template_tree_comparison): Add params "highlight_color_a"
and "highlight_color_b".
(cxx_format_postprocessor::handle):
Use highlight_colors::percent_h and highlight_colors::percent_i.
(pp_markup::element_quoted_type::print_type): New.
(range_label_for_type_mismatch::get_text): Pass nullptr for new
params of type_to_string_with_compare.
* typeck.cc (cp_comp_parm_types): New.
(cp_build_function_call_vec): Pass it to check_function_arguments.
(convert_for_assignment): Use highlight_colors::percent_h for the
labelled_range.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/bad-binary-ops-highlight-colors.C: New test.
* g++.dg/diagnostic/bad-binary-ops-no-highlight-colors.C: New test.
* g++.dg/plugin/plugin.exp (plugin_test_list): Add
show-template-tree-color-no-highlight-colors.C to
show_template_tree_color_plugin.c.
* g++.dg/plugin/show-template-tree-color-labels.C: Update expected
output to reflect use of highlight-a and highlight-b to contrast
mismatches.
* g++.dg/plugin/show-template-tree-color-no-elide-type.C:
Likewise.
* g++.dg/plugin/show-template-tree-color-no-highlight-colors.C:
New test.
* g++.dg/plugin/show-template-tree-color.C: Update expected output
to reflect use of highlight-a and highlight-b to contrast
mismatches.
* g++.dg/warn/Wformat-gcc_diag-1.C: New test.
* g++.dg/warn/Wformat-gcc_diag-2.C: New test.
* g++.dg/warn/Wformat-gcc_diag-3.C: New test.
* gcc.dg/bad-binary-ops-highlight-colors.c: New test.
* gcc.dg/format/colors.c: New test.
* gcc.dg/plugin/diagnostic_plugin_show_trees.c (show_tree): Pass
nullptr for new param of gcc_rich_location::add_expr.

libcpp/ChangeLog:
* include/rich-location.h (location_range::m_highlight_color): New
field.
(rich_location::rich_location): Add optional label_highlight_color
param.
(rich_location::set_highlight_color): New decl.
(rich_location::add_range): Add optional label_highlight_color
param.
(rich_location::set_range): Likewise.
* line-map.cc (rich_location::rich_location): Add
"label_highlight_color" param and pass it to add_range.
(rich_location::set_highlight_color): New.
(rich_location::add_range): Add "label_highlight_color" param.
(rich_location::set_range): Add "highlight_color" param.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

tree-optimization/115868 - ICE with .MASK_CALL in simdclone

The following adjusts mask recording which didn't take into account
that we can merge call arguments from two vectors like

  _50 = {vect_d_1.253_41, vect_d_1.254_43};
  _51 = VIEW_CONVERT_EXPR<unsigned char>(mask__19.257_49);
  _52 = (unsigned int) _51;
  _53 = _Z3bazd.simdclone.7 (_50, _52);
  _54 = BIT_FIELD_REF <_53, 256, 0>;
  _55 = BIT_FIELD_REF <_53, 256, 256>;

The testcase g++.dg/vect/pr68762-2.cc exercises this on x86_64 with
partial vector usage enabled and AVX512 support.

PR tree-optimization/115868
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Correctly
compute the number of mask copies required for vect_record_loop_mask.

Daily bump.

doc: Update GNU Modula 2 mailing list links

gcc:
* doc/gm2.texi (Community): Update lists.nongnu.org and
lists.gnu.org links.

[PR rtl-optimization/115876] Fix one of two ubsan reported issues in new ext-dce.cc code

David Binderman did a bootstrap build with ubsan enabled which triggered a few
errors in the new ext-dce.cc code. This fixes the trivial case of shifting
negative values.

Bootstrapped and regression tested on x86.

Pushing to the trunk.

gcc/
PR rtl-optimization/115876
* ext-dce.cc (carry_backpropagate): Make mask and mmask unsigned.

doc: remove @opindex for fconcepts-ts

We're getting complaints from the CI system about this removed option.
I suspect I should have removed the @opindex and @itemx for it. This
patch does that.

gcc/ChangeLog:

* doc/invoke.texi: Remove @opindex and @itemx for -fconcepts-ts.

Fix Xcode 16 build break with NULL != nullptr

As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
stddef.h header causes the NULL macro in C++ to be re-defined to an
integral constant (__null). This makes the workaround in d59a576b8
("Redefine NULL to nullptr") ineffective, as other headers that are
typically included after system.h (such as obstack.h) do include
stddef.h too.

This can be seen by running the sample below through `clang++ -E`

    #include <stddef.h>
    #define NULL nullptr
    #include <stddef.h>
    NULL

The relevant libc++ change is here:
https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131

Filed as FB14261859 to Apple and added a comment about it on LLVM PR
86843.

This fixes the cases in --enable-languages=c,c++,objc,obj-c++,rust build
where NULL being an integral constant instead of a null pointer literal
(therefore no longer implicitly converting to a pointer when used as a
template function's argument) caused issues.

    gcc/value-pointer-equiv.cc:65:43: error: no viable conversion from `pair<typename __unwrap_ref_decay<long>::type, typename __unwrap_ref_decay<long>::type>' to 'const pair<tree, tree>'

    65 |   const std::pair <tree, tree> m_marker = std::make_pair (NULL, NULL);
       |                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~

As noted in the previous commit though, the proper solution would be to
phase out the usages of NULL in GCC's C++ source code.

gcc/analyzer/ChangeLog:

* diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
Change NULL to nullptr.
(struct null_assignment_sm_context): Likewise.
* infinite-loop.cc: Likewise.
* infinite-recursion.cc: Likewise.
* varargs.cc (va_list_state_machine::on_leak): Likewise.

gcc/rust/ChangeLog:

* metadata/rust-imports.cc (Import::try_package_in_directory):
Change NULL to nullptr.

gcc/ChangeLog:

* value-pointer-equiv.cc: Change NULL to nullptr.

Signed-off-by: Daniel Bertalan <dani@danielbertalan.dev>

rtl-ssa: Fix prev_any_insn [PR115785]

Bit of a brown paper bag issue, but: due to the representation
of the insn chain, insn_info::prev_any_insn would sometimes skip
over instructions. This led to an invalid update in the PR when
adding and removing instructions.

I think one of the reasons I failed to spot this when checking
the code is that m_prev_insn_or_last_debug_insn is misnamed:
it's the previous instruction *of the same type* or the last
debug instruction in a group. The patch therefore renames it to
m_prev_sametype_or_last_debug_insn (with the term prev_sametype
already being used in some accessors).

The reason this didn't show up earlier is that (a) prev_any_insn
is rarely used directly, (b) no instructions were lost from the
def-use chains, and (c) only consecutive debug instructions were
skipped when walking the insn chain.

The chaining scheme makes prev_any_insn more complicated than
next_any_insn, prev_nondebug_insn and next_nondebug_insn, but the
object code produced is still relatively simple.

gcc/
PR rtl-optimization/115785
* rtl-ssa/insns.h (insn_info::prev_insn_or_last_debug_insn)
(insn_info::next_nondebug_or_debug_insn): Remove typedefs.
(insn_info::m_prev_insn_or_last_debug_insn): Rename to...
(insn_info::m_prev_sametype_or_last_debug_insn): ...this.
* rtl-ssa/internals.inl (insn_info::insn_info): Update after
above renaming.
(insn_info::copy_prev_from): Likewise.
(insn_info::set_prev_sametype_insn): Likewise.
(insn_info::set_last_debug_insn): Likewise.
(insn_info::clear_insn_links): Likewise.
(insn_info::has_insn_links): Likewise.
* rtl-ssa/member-fns.inl (insn_info::prev_nondebug_insn): Likewise.
(insn_info::prev_any_insn): Fix moves from non-debug to debug insns.

gcc/testsuite/
PR rtl-optimization/115785
* g++.dg/torture/pr115785.C: New test.

modula2: bootstrap fix for string and vector headers.

This patch fixes the include of headers (<string> and <vector>) which
are included after GCC's system.h has been included. It defines
INCLUDE_STRING before including "system.h". This allows gcc to
bootstrap with Apple clang 15.

gcc/m2/ChangeLog:

* gm2-gcc/m2linemap.cc (INCLUDE_STRING): Define before
include of gcc-consolidation.h.
* gm2spec.cc (INCLUDE_STRING): Define before include of
system.h.
(INCLUDE_VECTOR): Ditto.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

[RISC-V] Avoid unnecessary sign extension after memcmp

Similar to the str[n]cmp work, this adjusts the block compare expansion to do
its work in X mode with an appropriate lowpart extraction of the results at the
end of the sequence.

This has gone through my tester on rv32 and rv64, but that's it. Waiting on
pre-commit testing before moving forward.

gcc/

* config/riscv/riscv-string.cc (emit_memcmp_scalar_load_and_compare):
Set RESULT directly rather than using a temporary.
(emit_memcmp_scalar_result_calculation): Similarly.
(riscv_expand_block_compare_scalar): Use CONST0_RTX rather than
generating new RTL.
* config/riscv/riscv.md (cmpmemsi): Pass an X mode temporary to the
expansion routines. If necessary extract low part of the word to store
in final result location.

c++/modules: Add testcase for fixed issue with usings [PR115798]

This issue was fixed by r15-2003-gd6bf4b1c932211, but seems worth adding
to the testsuite.

PR c++/115798

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-26_a.C: New test.
* g++.dg/modules/using-26_b.C: New test.
* g++.dg/modules/using-26_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++/modules: Handle redefinitions of using-decls

This fixes an ICE exposed by supporting exported non-function
using-decls. Sometimes when preparing to define a class, xref_tag will
find a using-decl belonging to a different namespace, which triggers the
checking_assert in modules handling.

Ideally I feel that 'lookup_and_check_tag' should be told whether we're
about to define the type and handle erroring on redefinitions itself to
avoid this issue (and provide better diagnostics by acknowledging the
using-declaration), but this is complicated with the current
fragmentation of definition checking. So for this patch we just fixup
the assertion and ensure that pushdecl properly errors on the
conflicting declaration later.

gcc/cp/ChangeLog:

* decl.cc (xref_tag): Move assertion into condition.
* name-lookup.cc (check_module_override): Check for conflicting
types and using-decls.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-19_a.C: New test.
* g++.dg/modules/using-19_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: Introduce USING_DECLs for non-function usings [PR114683]

With modules, a non-function using-declaration is not completely
interchangable with the declaration that it refers to; in particular,
such a using-declaration may be exported without revealing the name of
the entity it refers to.

This patch fixes this by building USING_DECLs for all using-declarations
that bind a non-function from a different scope.  These new decls can
than have purviewness and exportingness attached to them without
affecting the decl that they refer to.

We do this for all such usings, not just usings that may be revealed in
a module; this way we can verify the change in representation against
the (more comprehensive) non-modules testsuites, and in a future patch
we can use the locations of these using-decls to enhance relevant
diagnostics.

Another possible approach would be to reuse OVERLOADs for this, as is
already done within add_binding_entity for modules.  I didn't do this
because lots of code (as well as the names of the accessors) makes
assumptions that OVERLOADs refer to function overload sets, and so
splitting this up reduced semantic burden and made it easier to avoid
unintentional changes.  This did mean that we need to move out the
definitions of ovl_iterator::{purview,exporting}_p, because the
structures for module decls are declared later on in cp-tree.h.

Building USING_DECLs changed a couple of code paths when adjusting
bindings; in particular, pushdecl recognises global using-declarations
as usings now, and so checks fall through to update_binding.  To not
regress g++.dg/lookup/linkage2.C the checks for 'extern' declarations no
longer were sufficient (they don't handle 'extern "C"'); but
duplicate_decls performed all the relevant checks anyway.

Otherwise in general we strip using-decls from all lookup_* functions
where necessary.  Over time for diagnostics purposes it would probably
be good to slowly revert this (especially e.g. lookup_elaborated_type
causes some diagnostic quality regressions here) but this patch doesn't
do so to minimise churn.

This patch also tries not to build USING_DECLs when just redeclaring an
existing declaration, and instead reveals that declaration in-place.
This requires reworking some logic handling CONST_DECLs in module
streaming, since a non-using CONST_DECL may now be exported indepenently
of its containing enum.

'add_binding_entity' needs to explicitly write the names of unscoped
enumerators so that lazy loading will trigger when the name is found by
name lookup; it does this by pretending that the enum declarations are
always usings so that it doesn't double-write definitions.  By also
checking if the enumerator was marked purview/exported we can use that
to override a non-purview/non-exported TYPE_DECL and ensure it's made
visible regardless.

When reading we should get the exported flag on the enumeration
constant, and so should properly create a binding for it.  We don't need
to do anything to handle importedness as that checking is skipped for
EK_USINGs.

Some other places assume that module information for a CONST_DECL
inherits module information from its containing type.  This includes:

- get_originating_module_decl, for determining if the name was imported
  or has module attachment; I don't /think/ this change should affect
  that, so I'm leaving this untouched.

- binding_cmp, for sorting by exportedness; since now an enumerator
  could be exported without the containing decl being exported, we need
  to handle this here too.

PR c++/114683

gcc/cp/ChangeLog:

* cp-tree.h (class ovl_iterator): Move definitions of purview_p
and exporting_p to name-lookup.cc.
* module.cc (depset::hash::add_binding_entity): Strip
using-decls.  Remove workarounds.  Handle CONST_DECLs with
different purview/exported from their enum.
(enum ct_bind_flags): Remove unnecessary cbf_wrapped flag.
(module_state::write_cluster): Likewise.
(module_state::read_cluster): Build USING_DECL for non-function
usings.
(binding_cmp): Handle CONST_DECLs with different purview and/or
exported from their enum.
(set_instantiating_module): Support CONST_DECLs.
* name-lookup.cc (get_fixed_binding_slot): Strip USING_DECLs.
(name_lookup::process_binding): Strip USING_DECLs.
(name_lookup::process_module_binding): Remove workaround.
(update_binding): Strip USING_DECLs, remove incorrect check for
non-extern variables.
(ovl_iterator::purview_p): Support USING_DECLs.
(ovl_iterator::exporting_p): Support USING_DECLs.
(walk_module_binding): Handle stat hack type.
(do_nonmember_using_decl): Strip USING_DECLs when comparing;
build USING_DECLs for non-function usings in different scope
rather than binding directly.
(get_namespace_binding): Strip USING_DECLs.
(lookup_name): Strip USING_DECLs.
(lookup_elaborated_type): Strip USING_DECLs.
* decl.cc (poplevel): Still support -Wunused for using-decls.
(lookup_and_check_tag): Remove unnecessary strip_using_decl.
* parser.cc (cp_parser_template_name): Likewise.
(cp_parser_nonclass_name): Likewise.
(cp_parser_class_name): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/using29.C: Update errors.
* g++.dg/lookup/using53.C: Update errors, add XFAILs.
* g++.dg/modules/using-22_b.C: Remove xfails.
* g++.dg/warn/Wunused-var-18.C: Update error, add check.
* g++.dg/lookup/using68.C: New test.
* g++.dg/modules/using-24_a.C: New test.
* g++.dg/modules/using-24_b.C: New test.
* g++.dg/modules/using-25_a.C: New test.
* g++.dg/modules/using-25_b.C: New test.
* g++.dg/modules/using-enum-4_a.C: New test.
* g++.dg/modules/using-enum-4_b.C: New test.
* g++.dg/modules/using-enum-4_c.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>

s390: Fully exploit vgm, vgbm, vrepi

Currently instructions vgm and vrepi are utilized only for constant
vectors where the element mode equals the element mode of the
corresponding instruction.  This patch lifts this restriction by making
use of those instructions for constant vectors even if element modes
do not coincide.  For example, the constant vector

  (v2di){0x7ffffffe7ffffffe, 0x7ffffffe7ffffffe}

can be loaded via vgmf %v0,1,30.  Similar, the constant vector

  (v4si){0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa, 0xaaaaaaaa}

can be loaded via vrepiq %v0,-86.

Analog, if the element mode of a constant vector is smaller than the
element mode of a corresponding instruction, we still may make use of
those instructions.  For example, the constant vector

  (v4si){0x7fff, 0xfffe0000, 0x7fff, 0xfffe0000}

can be loaded via vgmg %v0,17,46.  Similar, the constant vector

  (v4si){-1, -16643, -1, -16643}

can be loaded via vrepig %v0,-16643.

Additionally this patch enables vgm, vgbm, vrepi for partial vectors,
i.e., vectors of size less than 16 bytes.  Basically this is done by
treating a vector as a full vector resulting in replicating constants
into the ignored bits whereas vgbm sets those to zero.

Furthermore, there is no restriction to integer vectors anymore, i.e.,
supporting scalars of mode up to and including TI and TF and also
floating-point vectors.

Here are some numbers how often instructions are emitted for SPEC 2017:

        w/o patch     w/ patch
vgbm          140          365
vgm         17508        24452
vrepi        1360         2775

I expect most (maybe even all) to save us a load from the literal pool.

gcc/ChangeLog:

* config/s390/2964.md: Remove extended mnemonics for vgm.
* config/s390/3906.md: Remove extended mnemonics for vgm.
* config/s390/3931.md: Remove extended mnemonics for vgm.
* config/s390/8561.md: Remove extended mnemonics for vgm.
* config/s390/constraints.md (jKK): Remove constraint.
(jzz): Add constraint.
* config/s390/s390-protos.h (s390_contiguous_bitmask_vector_p):
Add prototype.
(s390_constant_via_vgm_p): Add prototype.
(s390_constant_via_vrepi_p): Add prototype.
* config/s390/s390.cc (s390_contiguous_bitmask_vector_p): New
function.
(s390_constant_via_vgm_vrepi_helper): New function.
(s390_constant_via_vgm_p): New function.
(s390_constant_via_vgbm_p): For the sake of symmetry rename
s390_bytemask_vector_p into s390_constant_via_vgbm_p.
(s390_bytemask_vector_p): Deal with non-integer and partial
vectors.
(s390_constant_via_vrepi_p): New function.
(s390_legitimate_constant_p): Allow partial vectors.
(legitimate_reload_constant_p): Fix indentation.
(legitimate_reload_vector_constant_p): Restrict to constraints
j00, jm1, jxx, jyy, jzz only, i.e., allow partial vectors.
(s390_expand_vec_init): Also make use of vrepi if possible.
(print_operand): Add q,p,r for vgm,vrepi,vgbm, respectively.
Remove e,s,t for constant vectors.
* config/s390/s390.md (movti): Add variants utilizing
vgbm,vgm,vrepi.
* config/s390/vector.md (mov<mode><tf_vr>): Adapt variants
for vgbm,vgm,vrepi for the new scheme.
(mov<mode>): Adapt variants for vgbm,vgm for the new
scheme and add vrepi variant for modes V_8,V_16,V_32,V_64.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-copysign.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vec-genmask-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vec-init-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vec-vrepi-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Change to
non-extended mnemonic.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Change to
non-extended mnemonic.
* gcc.target/s390/zvector/vec-genmask-1.c: Change to
non-extended mnemonic.
* gcc.target/s390/zvector/vec-splat-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/zvector/vec-splat-2.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vgbm-double-1.c: New test.
* gcc.target/s390/vector/vgbm-float-1.c: New test.
* gcc.target/s390/vector/vgbm-int128-1.c: New test.
* gcc.target/s390/vector/vgbm-integer-1.c: New test.
* gcc.target/s390/vector/vgbm-longdouble-1.c: New test.
* gcc.target/s390/vector/vgm-df-1.c: New test.
* gcc.target/s390/vector/vgm-di-1.c: New test.
* gcc.target/s390/vector/vgm-hi-1.c: New test.
* gcc.target/s390/vector/vgm-int128-1.c: New test.
* gcc.target/s390/vector/vgm-longdouble-1.c: New test.
* gcc.target/s390/vector/vgm-qi-1.c: New test.
* gcc.target/s390/vector/vgm-sf-1.c: New test.
* gcc.target/s390/vector/vgm-si-1.c: New test.
* gcc.target/s390/vector/vgm-tf-1.c: New test.
* gcc.target/s390/vector/vgm-ti-1.c: New test.
* gcc.target/s390/vector/vrepi-df-1.c: New test.
* gcc.target/s390/vector/vrepi-di-1.c: New test.
* gcc.target/s390/vector/vrepi-hi-1.c: New test.
* gcc.target/s390/vector/vrepi-int128-1.c: New test.
* gcc.target/s390/vector/vrepi-qi-1.c: New test.
* gcc.target/s390/vector/vrepi-sf-1.c: New test.
* gcc.target/s390/vector/vrepi-si-1.c: New test.
* gcc.target/s390/vector/vrepi-tf-1.c: New test.
* gcc.target/s390/vector/vrepi-ti-1.c: New test.

s390: Fix output template for movv1qi

Although for instructions MVI and MVIY it does not make a difference
whether the immediate is interpreted as signed or unsigned, GAS expects
unsigned immediates for instruction format SI_URD.

gcc/ChangeLog:

* config/s390/vector.md (mov<mode>): Fix output template for
movv1qi.

i386: Some AVX512 ternlog expansion refinements.

This patch replaces the calls to force_reg in ix86_expand_ternlog_binop
and ix86_expand_ternlog with gen_reg_rtx and emit_move_insn.
This patch also cleans up whitespace, consistently uses CONST_VECTOR_P
instead of GET_CODE and tweaks checks for ix86_ternlog_leaf_p (for
example where vpandn may take a memory operand).

2024-07-12  Roger Sayle  <roger@nextmovesoftware.com>
    Hongtao Liu  <hongtao.liu@intel.com>

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Use CONST_VECTOR_P instead of comparison against GET_CODE.
(ix86_gen_bcst_mem): Likewise.
(ix86_ternlog_leaf_p): Likewise.
(ix86_ternlog_operand_p): ix86_ternlog_leaf_p is always true for
vector_all_ones_operand.
(ix86_expand_ternlog_bin_op): Use CONST_VECTOR_P instead of
equality comparison against GET_CODE.  Replace call to force_reg
with gen_reg_rtx and emit_move_insn (for VEC_DUPLICATE broadcast).
Check for !register_operand instead of memory_operand.
Support CONST_VECTORs by calling force_const_mem.
(ix86_expand_ternlog): Fix indentation whitespace.
Allow ix86_ternlog_leaf_p as ix86_expand_ternlog_andnot's second
operand. Use CONST_VECTOR_P instead of equality against GET_CODE.
Use gen_reg_rtx and emit_move_insn for ~a, ~b and ~c cases.

s390: Align *cjump_64 and *icjump_64

During machine reorg we optimize backward jumps and transform insns as
e.g.

(jump_insn 118 117 119 (set (pc)
        (if_then_else (ne (reg:CCRAW 33 %cc)
                (const_int 8 [0x8]))
            (label_ref 134)
            (pc))) "dec_math_1.f90":204:8 discrim 1 2161 {*cjump_64}
     (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
        (int_list:REG_BR_PROB 719407028 (nil)))
-> 134)

into

(jump_insn 118 117 432 (set (pc)
        (if_then_else (ne (reg:CCRAW 33 %cc)
                (const_int 8 [0x8]))
            (pc)
            (label_ref 433))) "dec_math_1.f90":204:8 discrim 1 -1
     (expr_list:REG_DEAD (reg:CCRAW 33 %cc)
        (int_list:REG_BR_PROB 719407028 (nil)))
-> 433)

The latter is not recognized anymore since *icjump_64 only matches
CC_REGNUM against zero.  Fixed by aligning *cjump_64 and *icjump_64.

gcc/ChangeLog:

* config/s390/s390.md (*icjump_64): Allow raw CC comparisons,
i.e., any constant integer between 0 and 15 for CC comparisons.

aarch64: Avoid alloca in target attribute parsing

The handling of the target attribute used alloca to allocate
a copy of unverified user input, which could exhaust the stack
if the input is too long. This patch converts it to auto_vecs
instead.

I wondered about converting it to use std::string, which we
already use elsewhere, but that would be more invasive and
controversial.

gcc/
* config/aarch64/aarch64.cc (aarch64_process_one_target_attr)
(aarch64_process_target_attr): Avoid alloca.

[libstdc++] [testsuite] require dfprt on some tests

On a target that doesn't enable decimal float components in libgcc
(because the libc doens't define all required FE_* macros), but whose
compiler supports _Decimal* types, the effective target requirement
dfp passes, but several tests won't link because the runtime support
they depend on is missing. State their dfprt requirement.

for libstdc++-v3/ChangeLog

* testsuite/decimal/binary-arith.cc: Require dfprt.
* testsuite/decimal/comparison.cc: Likewise.
* testsuite/decimal/compound-assignment.cc: Likewise.
* testsuite/decimal/compound-assignment-memfunc.cc: Likewise.
* testsuite/decimal/make-decimal.cc: Likewise.
* testsuite/decimal/pr54036-1.cc: Likewise.
* testsuite/decimal/pr54036-2.cc: Likewise.
* testsuite/decimal/pr54036-3.cc: Likewise.
* testsuite/decimal/unary-arith.cc: Likewise.

[alpha] adjust MEM alignment for block move [PR115459]

Before issuing loads or stores for a block move, adjust the MEM
alignments if analysis of the addresses enabled the inference of
stricter alignment. This ensures that the MEMs are sufficiently
aligned for the corresponding insns, which avoids trouble in case of
e.g. substitutions into SUBREGs.

for gcc/ChangeLog

PR target/115459
* config/alpha/alpha.cc (alpha_expand_block_move): Adjust
MEMs to match inferred alignment.

RISC-V: NO_WARNING preferred else value for RVV

PR target/115840.

In riscv_preferred_else_value, we create an uninitialized tmp var
for else value, instead of the 0 (as default_preferred_else_value)
or the pre-exists VAR (as aarch64 does), so that we can use agnostic
policy.

The problem is that `warn_uninit` will emit a warning:
'({anonymous})' may be used uninitialized

Let's mark this tmp var as NO_WARNING.

This problem is found when I try to build glibc with V extension.

gcc

PR target/115840
* config/riscv/riscv.cc(riscv_preferred_else_value): Mark
tmp_var as NO_WARNING.

gcc/testsuite
* gcc.dg/vect/pr115840.c: New testcase.

fortran: Factor the evaluation of MINLOC/MAXLOC's BACK argument

Move the evaluation of the BACK argument out of the loop in the inline code
generated for MINLOC or MAXLOC.  For that, add a new (scalar) element
associated with BACK to the scalarization loop chain, evaluate the argument
with the context of that element, and let the scalarizer do its job.

The problem was not only a missed optimisation, but also a wrong code
one in the cases where the expression associated with BACK is not free of
side-effects, making multiple evaluations observable.

The new tests check the evaluation count of the BACK argument, and try to
cover all the variations (integral or floating-point type, constant or
unknown shape, absent or scalar or array MASK) supported by the inline
implementation of the functions.  Care has been taken to not check the case
of a constant .FALSE. MASK, for which the evaluation of BACK can be elided.

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Create a new
scalar scalarization chain element if BACK is present.  Add it to
the loop.  Set the scalarization chain before evaluating the
argument.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_5.f90: New test.
* gfortran.dg/minloc_5.f90: New test.

RISC-V: Disable misaligned vector access in hook riscv_slow_unaligned_access[PR115862]

The reason is that in the following code, icode = movmisalignv8si has
already been rejected by TARGET_VECTOR_MISALIGN_SUPPORTED, but it is
allowed by targetm.slow_unaligned_access,which is contradictory.

(((icode = optab_handler (movmisalign_optab, mode))
!= CODE_FOR_nothing)
|| targetm.slow_unaligned_access (mode, align))

misaligned vector access should be enabled by -mno-vector-strict-align option.

PR target/115862

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_slow_unaligned_access): Disable vector misalign.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>

RISC-V: Add SiFive extensions, xsfvcp and xsfcease

We have already upstreamed these extensions into binutils, and now we need GCC
to recognize these extensions and pass them to binutils as well. We also plan
to upstream intrinsics in the near future. :)

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info): Add xsfvcp.
(riscv_ext_version_table): Add xsfvcp, xsfcease.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt (riscv_sifive_subext): New.
(XSFVCP): New.
(XSFCEASE): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-sf-1.c: New.
* gcc.target/riscv/predef-sf-2.c: New.

rs6000: Remove vcond{,u} expanders

As PR114189 shows, middle-end will obsolete vcond, vcondu
and vcondeq optabs soon. This patch is to remove all
vcond{,u} expanders in rs6000 port and adjust the function
rs6000_emit_vector_cond_expr which is called by those
expanders as static.

PR target/115659

gcc/ChangeLog:

* config/rs6000/rs6000-protos.h (rs6000_emit_vector_cond_expr): Remove.
* config/rs6000/rs6000.cc (rs6000_emit_vector_cond_expr): Add static
qualifier as it is only called by rs6000_emit_swsqrt now.
* config/rs6000/vector.md (vcond<VEC_F:mode><VEC_F:mode>): Remove.
(vcond<VEC_I:mode><VEC_I:mode>): Remove.
(vcondv4sfv4si): Likewise.
(vcondv4siv4sf): Likewise.
(vcondv2dfv2di): Likewise.
(vcondv2div2df): Likewise.
(vcondu<VEC_I:mode><VEC_I:mode>): Likewise.
(vconduv4sfv4si): Likewise.
(vconduv2dfv2di): Likewise.

tree-optimization/115867 - ICE with simdcall vectorization in masked loop

When only a loop mask is to be supplied for the inbranch arg to a
simd function we fail to handle integer mode masks correctly. We
need to guess the number of elements represented by it. This assumes
that excess arguments are all for masks, I wasn't able to create
a simdclone with more than one integer mode mask argument.

The gcc.dg/vect/vect-simd-clone-20.c exercises this with -mavx512vl

PR tree-optimization/115867
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Properly
guess the number of mask elements for integer mode masks.

[committed] Fix m68k bootstrap segfault with late-combine

So the m68k port has failed to bootstrap since the introduction of
late-combine.  My suspicion has been this is a backend problem.  Sure enough
after bisecting things down (thank goodness for the debug counter!) I'm happy
to report m68k (after this patch) has moved into its stage3 build for the first
time in a month.

Basically late-combine propagated an address calculation to its use points,
generating this insn (dwarf2out.c, I forget what function):

> (insn 653 652 655 (parallel [
>             (set (mem/j:DI (plus:SI (plus:SI (reg/f:SI 9 %a1 [orig:64 _67 ] [64])
>                             (reg:SI 0 %d0 [321]))
>                         (const_int 20 [0x14])) [0 slot_204->dw_attr_val.v.val_unsigned+0 S8 A16])
>                 (sign_extend:DI (mem/c:SI (plus:SI (reg/f:SI 14 %a6)
>                             (const_int -28 [0xffffffffffffffe4])) [870 %sfp+-28 S4 A16])))
>             (clobber (reg:SI 0 %d0))
>         ]) "../../../gcc/gcc/dwarf2out.cc":24961:23 93 {extendsidi2}
>      (expr_list:REG_DEAD (reg/f:SI 9 %a1 [orig:64 _67 ] [64])
>         (expr_list:REG_DEAD (reg:SI 0 %d0 [321])
>             (expr_list:REG_UNUSED (reg:SI 0 %d0)
>                 (nil)))))
Note how the output uses d0 in the address calculation and the clobber uses d0.

It matches this insn in the md file:

> (define_insn "extendsidi2"
>   [(set (match_operand:DI 0 "nonimmediate_operand" "=d,o,o,<")
>         (sign_extend:DI
>          (match_operand:SI 1 "nonimmediate_src_operand" "rm,rm,r<Q>,rm")))
>    (clobber (match_scratch:SI 2 "=X,&d,&d,&d"))]
>   ""
> {
>   if (which_alternative == 0)
>     /* Handle alternative 0.  */
>     {
>       if (TARGET_68020 || TARGET_COLDFIRE)
>         return "move%.l %1,%R0\;smi %0\;extb%.l %0";
>       else
>         return "move%.l %1,%R0\;smi %0\;ext%.w %0\;ext%.l %0";
>     }
>
>   /* Handle alternatives 1, 2 and 3.  We don't need to adjust address by 4
>      in alternative 3 because autodecrement will do that for us.  */
>   operands[3] = adjust_address (operands[0], SImode,
>                                 which_alternative == 3 ? 0 : 4);
>   operands[0] = adjust_address (operands[0], SImode, 0);
>
>   if (TARGET_68020 || TARGET_COLDFIRE)
>     return "move%.l %1,%3\;smi %2\;extb%.l %2\;move%.l %2,%0";
>   else
>     return "move%.l %1,%3\;smi %2\;ext%.w %2\;ext%.l %2\;move%.l %2,%0";
> }
>   [(set_attr "ok_for_coldfire" "yes,no,yes,yes")])
Note the smi/ext instruction pair in the case for alternatives 1..3.  Those
clobber the scratch register before we're done consuming inputs.  The scratch
register really needs to be marked as an earlyclobber.

That fixes the bootstrap problem, but a cursory review of m68k.md is not
encouraging.  I will not be surprised at all if there's more of this kind of
problem lurking.

But happy to at least have m68k bootstrapping again.   It's failing the
comparison test, but definitely progress.

* config/m68k/m68k.md (extendsidi2): Add missing early clobbers.

libbacktrace: avoid infinite recursion

We could get an infinite recursion in an odd case in which a
.gnu_debugdata section was added to a debug file, and mini_debuginfo
was put into the debug file, and the debug file was put into a
/usr/lib/debug directory to be found by build ID. This combination
doesn't really make sense but we shouldn't get an infinite recursion.

* elf.c (elf_add): Don't use .gnu_debugdata if we are already
reading a debuginfo file.
* Makefile.am (m2test_*): New test targets.
(CHECK_PROGRAMS): Add m2test.
(MAKETESTS): Add m2test_minidebug2.
(%_minidebug2): New pattern.
(CLEANFILES): Remove minidebug2 files.
* Makefile.in: Regenerate.

LoongArch: Remove unreachable codes.

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_split_move): Delete.
(loongarch_hard_regno_mode_ok_uncached): Likewise.
* config/loongarch/loongarch.md
(move_doubleword_fpr<mode>): Likewise.
(load_low<mode>): Likewise.
(load_high<mode>): Likewise.
(store_word<mode>): Likewise.
(movgr2frh<mode>): Likewise.
(movfrh2gr<mode>): Likewise.

LoongArch: TFmode is not allowed to be stored in the float register.

PR target/115752

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_hard_regno_mode_ok_uncached): Replace
UNITS_PER_FPVALUE with UNITS_PER_HWFPVALUE.
* config/loongarch/loongarch.h (UNITS_PER_FPVALUE): Delete.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/pr115752.c: New test.