Sriraman Tallam [Tue, 6 Nov 2012 02:35:17 +0000 (02:35 +0000)]
Function Multiversioning
========================
Sriraman Tallam, tmsriram@google.com
Overview of the patch which adds support to specify function versions. This is
only enabled for target i386.
Example:
int foo (); /* Default version */
int foo () __attribute__ ((target("avx,popcnt")));/*Specialized for avx and popcnt */
int foo () __attribute__ ((target("arch=core2,ssse3")));/*Specialized for core2 and ssse3*/
int main ()
{
int (*p)() = &foo;
return foo () + (*p)();
}
int foo ()
{
return 0;
}
int __attribute__ ((target("avx,popcnt")))
foo ()
{
return 0;
}
int __attribute__ ((target("arch=core2,ssse3")))
foo ()
{
return 0;
}
The above example has foo defined 3 times, but all 3 definitions of foo are
different versions of the same function. The call to foo in main, directly and
via a pointer, are calls to the multi-versioned function foo which is dispatched
to the right foo at run-time.
Front-end changes:
The front-end changes are calls at appropriate places to target hooks that
determine the following:
* Determine if two function decls with the same signature are versions.
* Determine the assembler name of a function version.
* Generate the dispatcher function for a set of function versions.
* Compare versions to see if one has a higher priority over the other.
All the implementation happens in the target-specific config/i386/i386.c.
What does the patch do?
* Tracking decls that correspond to function versions of function
name, say "foo":
When the front-end sees more than one decl for "foo", it calls a target hook to
determine if they are versions. To prevent duplicate definition errors with
other versions of "foo", "decls_match" function in cp/decl.c is made to return
false when 2 decls have are deemed versions by the target. This will make all
function versions of "foo" to be added to the overload list of "foo".
* Change the assembler names of the function versions.
For i386, the target changes the assembler names of the function versions by
suffixing the sorted list of args to "target" to the function name of "foo".
For example, the assembler name of
"void foo () __attribute__ ((target ("sse4")))" will
become _Z3foov.sse4. The target hook mangle_decl_assembler_name is used
for this.
* Overload resolution:
Function "build_over_call" in cp/call.c sees a call to function
"foo", which is multi-versioned. The overload resolution happens in
function "joust" in "cp/call.c". Here, the call to "foo" has all
possible versions of "foo" as candidates. All the candidates of "foo" are
stored in the cgraph side data structure. Each version of foo is chained in a
doubly-linked list with the default function as the first element. This allows
any pass to access all the semantically identical versions. A call to a
multi-versioned function will be replaced by a call to a dispatcher function,
determined by a target hook, to execute the right function version at run-time.
Optimization to directly call a version when possible:
Also, in joust, where overload resolution happens, a multiversioned function
resolution is made to return the most specialized version. This is the version
that will be checked for dispatching first and is determined by the target.
Now, if the caller can inline this function version then a direct call is made
to this function version rather than go through the dispatcher. When a direct
call cannot be made, a call to the dispatcher function is created.
* Creating the dispatcher body.
The dispatcher body, called the resolver is made only when there is a call to a
multiversioned function dispatcher or the address of a function is taken. This
is generated during cgraph_analyze_function. This is done by another target hook.
* Dispatch ordering.
The order in which the function versions are checked during dispatch is based
on a priority value assigned for the ISA that is catered. More specialized
versions are checked for dispatching first. This is to mitigate the ambiguity
that can arise when more than one function version is valid for execution on
a particular platform. This is not a perfect solution, and in future the user
should be allowed to assign a dispatching priority value to each version.
Function MV in the Intel compiler:
The intel compiler supports function multiversioning and the syntax is
similar to the patch proposed here. Here is an example of how to
generate multiple function versions with the intel compiler.
/* Create a stub function to specify the various versions of function that
will be created, using declspec attribute cpu_dispatch. */
__declspec (cpu_dispatch (core_i7_sse4_2, atom, generic))
void foo () {};
/* The generic or the default version. */
__declspec (cpu_specific(generic))
void foo ()
{
printf ("This is generic");
}
A new function version is generated by defining a new function with the same
signature but with a different cpu_specific declspec attribute string. The
set of cpu_specific strings that are allowed is the following:
Comparison with the GCC MV implementation in this patch:
* Version creation syntax:
The implementation in this patch also has a similar syntax to specify function
versions. The first stub function is not needed. Here is the code to generate
the function versions with this patch:
* doc/tm.texi.in (TARGET_OPTION_FUNCTION_VERSIONS): New hook
description.
* (TARGET_COMPARE_VERSION_PRIORITY): New hook description.
* (TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New hook description.
* (TARGET_GENERATE_VERSION_DISPATCHER_BODY): New hook description.
* doc/tm.texi: Regenerate.
* target.def (compare_version_priority): New target hook.
* (generate_version_dispatcher_body): New target hook.
* (get_function_versions_dispatcher): New target hook.
* (function_versions): New target hook.
* cgraph.c (cgraph_fnver_htab): New htab.
(cgraph_fn_ver_htab_hash): New function.
(cgraph_fn_ver_htab_eq): New function.
(version_info_node): New pointer.
(insert_new_cgraph_node_version): New function.
(get_cgraph_node_version): New function.
(delete_function_version): New function.
(record_function_versions): New function.
* cgraph.h (cgraph_node): New bitfield dispatcher_function.
(cgraph_function_version_info): New struct.
(get_cgraph_node_version): New function.
(insert_new_cgraph_node_version): New function.
(record_function_versions): New function.
(delete_function_version): New function.
(init_lowered_empty_function): Expose function.
* tree.h (DECL_FUNCTION_VERSIONED): New macro.
(tree_function_decl): New bit-field versioned_function.
* cgraphunit.c (cgraph_analyze_function): Generate body of multiversion
function dispatcher.
(cgraph_analyze_functions): Analyze dispatcher function.
(init_lowered_empty_function): Make non-static. New parameter in_ssa.
(assemble_thunk): Add parameter to call to init_lowered_empty_function.
* config/i386/i386.c (add_condition_to_bb): New function.
(get_builtin_code_for_version): New function.
(ix86_compare_version_priority): New function.
(feature_compare): New function.
(dispatch_function_versions): New function.
(ix86_function_versions): New function.
(attr_strcmp): New function.
(ix86_mangle_function_version_assembler_name): New function.
(ix86_mangle_decl_assembler_name): New function.
(make_name): New function.
(make_dispatcher_decl): New function.
(is_function_default_version): New function.
(ix86_get_function_versions_dispatcher): New function.
(make_attribute): New function.
(make_resolver_func): New function.
(ix86_generate_version_dispatcher_body): New function.
(fold_builtin_cpu): Return integer for cpu builtins.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New macro.
(TARGET_COMPARE_VERSION_PRIORITY): New macro.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New macro.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New macro.
(TARGET_OPTION_FUNCTION_VERSIONS): New macro.
* class.c (add_method): Change assembler names of function versions.
(mark_versions_used): New static function.
(resolve_address_of_overloaded_function): Create dispatcher decl and
return address of dispatcher instead.
* decl.c (decls_match): Make decls unmatched for versioned
functions.
(duplicate_decls): Remove ambiguity for versioned functions.
Delete versioned function data for merged decls.
* decl2.c (check_classfn): Check attributes of versioned functions
for match.
* call.c (get_function_version_dispatcher): New function.
(mark_versions_used): New static function.
(build_over_call): Make calls to multiversioned functions
to call the dispatcher.
(joust): For calls to multi-versioned functions, make the most
specialized function version win.
* testsuite/g++.dg/mv1.C: New test.
* testsuite/g++.dg/mv2.C: New test.
* testsuite/g++.dg/mv3.C: New test.
* testsuite/g++.dg/mv4.C: New test.
* testsuite/g++.dg/mv5.C: New test.
* testsuite/g++.dg/mv6.C: New test.
Eric Botcazou [Mon, 5 Nov 2012 21:39:02 +0000 (21:39 +0000)]
re PR tree-optimization/54986 (segfault on constant initialized to object address at -O)
PR tree-optimization/54986
* gimple-fold.c (canonicalize_constructor_val): Strip again all no-op
conversions on entry but add them back on exit if needed.
François Dumont [Mon, 5 Nov 2012 20:58:35 +0000 (20:58 +0000)]
throw_allocator.h (__throw_value_base): Add move semantic, not throwing.
2012-10-05 François Dumont <fdumont@gcc.gnu.org>
* include/ext/throw_allocator.h (__throw_value_base): Add move
semantic, not throwing.
(__throw_value_limit): Likewise.
(__throw_value_random): Likewise.
* testsuite/util/exception/safety.h: Add validation of C++11
methods emplace/emplace_front/emplace_back/emplace_hint.
* testsuite/util/testsuite_container_traits.h: Signal emplace
support on deque, forward_list, list and vector.
* testsuite/23_containers/deque/requirements/exception/
propagation_consistent.cc: Remove dg-do run fail.
re PR target/55204 (ICE: in extract_insn, at recog.c:2140 (unrecognizable insn) with -O --param loop-invariant-max-bbs-in-loop=0)
gcc/
PR target/55204
* config/i386/i386.c (ix86_address_subreg_operand): Remove stack
pointer check.
(print_reg): Use true_regnum rather than REGNO.
(ix86_print_operand_address): Remove SUBREG handling.
Joern Rennecke [Mon, 5 Nov 2012 15:18:10 +0000 (15:18 +0000)]
md.texi (Defining Attributes): Document that we are defining HAVE_ATTR_name macors as 1 for defined attributes...
* doc/md.texi (Defining Attributes): Document that we are defining
HAVE_ATTR_name macors as 1 for defined attributes, and as 0
for undefined special attributes.
* final.c (asm_insn_count, align_fuzz): Always define.
(insn_current_reference_address): Likewise.
(init_insn_lengths): Use if (HAVE_ATTR_length) instead of
#ifdef HAVE_ATTR_length.
(get_attr_length_1, shorten_branches, final): Likewise.
(final_scan_insn, output_asm_name): Likewise.
* genattr.c (gen_attr): Define HAVE_ATTR_name macros for
defined attributes as 1.
Remove ancient get_attr_alternative compatibility code.
For special purpose attributes not provided, define HAVE_ATTR_name
as 0.
In case no length attribute is given, provide stub definitions
for insn_*_length* functions, and also include insn-addr.h.
In case no enabled attribute is given, provide stub definition.
* genattrtab.c (write_length_unit_log): Always write a definition.
* hooks.c (hook_int_rtx_1, hook_int_rtx_unreachable): New functions.
* hooks.h (hook_int_rtx_1, hook_int_rtx_unreachable): Declare.
* lra-int.h (struct lra_insn_recog_data): Make member
alternative_enabled_p unconditional.
* lra.c (free_insn_recog_data): Use if (HAVE_ATTR_length) instead of
#ifdef HAVE_ATTR_length.
(lra_set_insn_recog_data): Likewise. Make initialization of
alternative_enabled_p unconditional.
(lra_update_insn_recog_data): Use #if instead of #ifdef for
HAVE_ATTR_enabled.
* recog.c [!HAVE_ATTR_enabled] (get_attr_enabled): Don't define.
(extract_insn): Check HAVE_ATTR_enabled.
(gate_handle_split_before_regstack): Use #if instead of
#if defined for HAVE_ATTR_length.
Jan Hubicka [Mon, 5 Nov 2012 14:00:46 +0000 (15:00 +0100)]
ipa-inline.c (compute_uninlined_call_time, [...]): New functions.
* ipa-inline.c (compute_uninlined_call_time,
compute_inlined_call_time): New functions.
(RELATIVE_TIME_BENEFIT_RANGE): New macro.
(relative_time_benefit): Rewrite.
(edge_badness): Rewrite path with guessed profile and estimated profile.
* ipa-inline.h (INLINE_HINT_declared_inline, INLINE_HINT_cross_module):
New hints.
(struct inline_summary): Add GROWTH filed.
* ipa-inline-analysis.c (dump_inline_hints): Update.
(reset_inline_summary): Update.
(dump_inline_summary): Update.
(will_be_nonconstant_predicate): Cleanup to use gimple_store_p and
gimple_assign_load_p predicates.
(estimate_node_size_and_time): Drop INLINE_HINT_declared_inline hint.
(simple_edge_hints): New function.
(do_estimate_edge_time): Return time of invocation of callee rather
than the time scaled by edge frequency; update hints code.
(do_estimate_edge_hints): Update.
(do_estimate_growth): Cleanup.
* tree-ssa-loop-niter.c (find_loop_niter): Remove just_once_each_iteration_p.
(maybe_lower_iteration_bound): Initialize not_executed_last_iteration to NULL
* tree-ssa-loop-ivcanon.c (canonicalize_loop_induction_variables): Skip
just_once_each_iteration_p; record estimated bound when loop has only one
likely exit; test just_once_each_iteration_p before IV canon itself.
Jakub Jelinek [Mon, 5 Nov 2012 07:58:48 +0000 (08:58 +0100)]
re PR debug/54402 (var-tracking does not scale)
PR debug/54402
* var-tracking.c (fp_setter): Return false if there is REG_CFA_RESTORE
hfp note.
(vt_initialize): Look for fp_setter in any bb, not just successor of
entry bb.
Oleg Endo [Mon, 5 Nov 2012 01:06:18 +0000 (01:06 +0000)]
sh.h (TARGET_CACHE32, [...]): Delete macro.
* config/sh/sh.h (TARGET_CACHE32, TARGET_HARVARD): Delete macro.
(TARGET_SUPERSCALAR): Add TARGET_SH2A.
(CACHE_LOG): Use TARGET_HARD_SH4 and TARGET_SH5 instead of
TARGET_CACHE32.
(TRAMPOLINE_ALIGNMENT): Use TARGET_HARD_SH4 and TARGET_SH5 instead of
TARGET_HARVARD.
* config/sh/sh.c (sh_trampoline_init): Likewise.
Jack Howarth [Sat, 3 Nov 2012 21:39:06 +0000 (21:39 +0000)]
Add check_effective_target_masm_intel
PR target/54255
* lib/target-supports.exp (check_effective_target_masm_intel): New
proc.
* gcc.target/i386/asm-dialect-1.c: Use dg-require-effective-target
masm_intel.
H.J. Lu [Sat, 3 Nov 2012 21:36:48 +0000 (14:36 -0700)]
Add check_effective_target_maybe_x32
* lib/target-supports.exp (check_effective_target_maybe_x32): New
proc.
* gcc.target/i386/pr54457.c: Use dg-require-effective-target
maybe_x32.
* gcc.target/i386/pr53249.c: Likewise.
Co-Authored-By: Jack Howarth <howarth@bromo.med.uc.edu>
From-SVN: r193126
Oleg Endo [Sat, 3 Nov 2012 12:01:01 +0000 (12:01 +0000)]
re PR target/51244 ([SH] Inefficient conditional branch and code around T bit)
PR target/51244
* config/sh/sh.md (*cbranch_t): Allow splitting after reload.
Allow going beyond current basic block before reload when looking for
the reg set insn.
* config/sh/sh.c (sh_find_set_of_reg): Don't stop at labels.
Andrew Pinski [Fri, 2 Nov 2012 23:32:32 +0000 (23:32 +0000)]
re PR rtl-optimization/54524 (Spurious add on sum of bitshifts (forward-propagate issue))
2012-11-02 Andrew Pinski <apinski@cavium.com>
PR rtl-opt/54524
* simplify-rtx.c (simplify_relational_operation_1): Don't simplify
(LTU/GEU (PLUS a 0) 0) into (GEU/LTU a 0) since they are not equivalent.
Diego Novillo [Fri, 2 Nov 2012 19:43:25 +0000 (15:43 -0400)]
Add a new option --clean_build to validate_failures.py
This is useful when you have two builds of the same compiler. One with
your changes. The other one, a clean build at the same revision.
Instead of using a manifest file, --clean_build will compare the
results it gather from the patched build against those it gathers from
the clean build.
Jan Hubicka [Fri, 2 Nov 2012 19:35:44 +0000 (20:35 +0100)]
tree-ssa-loop-niter.c (double_int_cmp, [...]): New functions.
* tree-ssa-loop-niter.c (double_int_cmp, bound_index,
discover_iteration_bound_by_body_walk): New functions.
(discover_iteration_bound_by_body_walk): Use it.
Tobias Burnus [Fri, 2 Nov 2012 16:59:30 +0000 (17:59 +0100)]
fmaq.c (fmaq): Merge from GLIBC.
2012-11-01 Tobias Burnus <burnus@net-b.de>
Joseph Myers <joseph@codesourcery.com>
* math/fmaq.c (fmaq): Merge from GLIBC. Handle cases
with small x * y using scaling, not as x * y + z.
* math/lgammaq.c (lgammaq): Fix signgam handling.
Co-Authored-By: Joseph Myers <joseph@codesourcery.com>
From-SVN: r193099
Jan Hubicka [Fri, 2 Nov 2012 16:34:52 +0000 (17:34 +0100)]
re PR tree-optimization/55079 (false positive -Warray-bounds (also seen at -O3 bootstrap))
PR middle-end/55079
* tree-ssa-loop-niter.c (number_of_iterations_exit): Update
MAX field if NITER was folded to contant.
(record_estimate): Sanity check.
* tree-ssa-loop-ivcanon.c (remove_exits_and_undefined_stmts): New
function.
(remove_redundant_iv_test): New function.
(loops_to_unloop, loops_to_unloop_nunroll): New static vars.
(unloop_loops): Break out from ...
(try_unroll_loop_completely): ... here; Pass in MAXITER; use
remove_exits_and_undefined_stmts; do not unloop.
(canonicalize_loop_induction_variables): Compute MAXITER;
use remove_redundant_iv_test; remove loop_close_ssa_invalidated
and irred_invalidated arguments.
(canonicalize_induction_variables): Compute fresh bound estimates;
unloop; walk from innermost.
(tree_unroll_loops_completely): Likewise.
* gcc.dg/tree-ssa/cunroll-10.c: New testcase.
* gcc.dg/tree-ssa/cunroll-9.c: New testcase.
* include/bits/forward_list.h (forward_list(size_type)): Add missing
allocator parameter.
(_Fwd_list_node_base): Use NSDMI and define constructor as defaulted.
(_Fwd_list_node::_M_value): Replace with uninitialized storage.
(_Fwd_list_node::_M_valptr()): Define functions to access storage.
(_Fwd_list_iterator, _Fwd_list_const_iterator): Use _M_valptr.
(_Fwd_list_base::_M_create_node): Only use allocator to construct the
element not the node.
* include/bits/forward_list.tcc (_Fwd_list_base::_M_erase_after): Only
use allocator to destroy the element not the node.
* testsuite/23_containers/forward_list/cons/11.cc: Remove unused
headers.
* testsuite/23_containers/forward_list/cons/12.cc: Likewise.
* testsuite/23_containers/forward_list/cons/13.cc: New.
* testsuite/23_containers/forward_list/cons/14.cc: New.
Gerald Pfeifer [Fri, 2 Nov 2012 00:25:46 +0000 (00:25 +0000)]
codecvt.xml: Fix reference to Austin Common Standards Revision Group.
* doc/xml/manual/codecvt.xml: Fix reference to Austin Common
Standards Revision Group.
* doc/xml/manual/messages.xml: Ditto.
* doc/xml/manual/using_exceptions.xml: Ditto.
* doc/xml/manual/messages.xml: Fix reference to GNU gettext.
* doc/xml/manual/policy_data_structures.xml: Fix reference to
STL at SGI.
Update reference to COM at Microsoft.
Update reference to Worst-case efficient priority queues at ACM.
Lawrence Crowl [Thu, 1 Nov 2012 21:02:15 +0000 (21:02 +0000)]
This patch renames sbitmap iterators to unify them with the bitmap iterators.
Remove the unused EXECUTE_IF_SET_IN_SBITMAP_REV, which has an unconventional
interface.
Rename the sbitmap_iter_* functions to match bitmap's bmp_iter_* functions.
Add an additional parameter to the initialization and next functions to
match the interface in bmp_iter_*. This extra parameter is mostly hidden
by the use of the EXECUTE_IF macros.
Rename the EXECUTE_IF_SET_IN_SBITMAP macro to EXECUTE_IF_SET_IN_BITMAP. Its
implementation is now identical to that in bitmap.h. To prevent redefinition
errors, both definitions are now guarded by #ifndef. An alternate strategy
is to simply include bitmap.h from sbitmap.h. As this would increase build
time, I have elected to use the #ifndef version. I do not have a strong
preference here.
The sbitmap_iterator type is still distinctly named because it is often
declared in contexts where the bitmap type is not obvious. There are less
than 40 uses of this type, so the burden to modify it when changing bitmap
types is not large.
Tested on x86-64, config-list.mk testing.
Index: gcc/ChangeLog
2012-10-31 Lawrence Crowl <crowl@google.com>
* sbitmap.h (sbitmap_iter_init): Rename bmp_iter_set_init and add
unused parameter to match bitmap iterator. Update callers.
(sbitmap_iter_cond): Rename bmp_iter_set. Update callers.
(sbitmap_iter_next): Rename bmp_iter_next and add unused parameter to
match bitmap iterator. Update callers.
(EXECUTE_IF_SET_IN_SBITMAP_REV): Remove unused.
(EXECUTE_IF_SET_IN_SBITMAP): Rename EXECUTE_IF_SET_IN_BITMAP and
adjust to be identical to the definition in bitmap.h. Conditionalize
the definition based on not having been defined. Update callers.
* bitmap.h (EXECUTE_IF_SET_IN_BITMAP): Conditionalize the definition
based on not having been defined. (To match the above.)
François Dumont [Thu, 1 Nov 2012 20:55:51 +0000 (20:55 +0000)]
hashtable_policy.h (__details::_Before_begin<>): New, combine a base node instance and an allocator.
2012-11-01 François Dumont <fdumont@gcc.gnu.org>
* include/bits/hashtable_policy.h (__details::_Before_begin<>):
New, combine a base node instance and an allocator.
* include/bits/hashtable.h (_Hashtable<>::_M_node_allocator): Remove.
(_Hashtable<>::_M_before_begin): Rename into _M_bbegin and type
modified to __detail::_Before_begin<>.
(_Hashtable<>::_M_node_allocator()): New, get the node allocator
part of _M_bbegin.
(_Hashtable<>::_M_before_begin()): New, get the before begin node
part of _M_bbegin.
(_Hashtable<>): Adapt to use latter.