Add direct support for Linux kernel __fentry__ patching
The Linux kernel dynamically patches in __fentry__ calls in and
out at runtime. This allows using function tracing for debugging
in production kernels without (significant) performance penalty.
For this it needs a table pointing to each __fentry__ call.
The way it is currently implemented is that a special
perl script scans the object file, generates the table in a special
section. When the kernel boots up it nops the calls, and
then later patches in the calls again as needed.
The recordmcount.pl script in the kernel works, but it seems
cleaner and faster to support the code generation of the patch table
directly in gcc.
This also allows to nop the calls directly at code generation
time, which allows to skip a patching step at kernel boot.
I also expect that a patchable production tracing facility is also useful
for other applications.
For example it could be used in ftracer
(https://github.com/andikleen/ftracer)
Having a nop area at the beginning of each function can be also
also useful for other things. For example it can be used to patch
functions at runtime to point to different functions, to do
binary updates without restarting the program (like ksplice or
similar)
This patch implements two new options for the i386 target:
-mrecord-mcount
Generate a __mcount_loc section entry for each __fentry__ or mcount
call. The section is compatible with the kernel convention
and the data is put into a section loaded at runtime.
-mnop-mcount
Generate the mcount/__fentry__ call as 5 byte nop that can be
patched in later. The nop is generated as a single instruction,
as the Linux kernel run time patching relies on this.
Limitations:
- I didn't implement -mnop-mcount for -fPIC. This
would need a good single instruction 6 byte NOP and it seems a
bit pointless, as the patching would prevent text sharing.
- I didn't implement noping for targets that pass a variable
to mcount.
- The facility could be useful on architectures too. Currently
the mcount code is target specific, so I made it a i386 option.
gcc/:
2014-09-25 Andi Kleen <ak@linux.intel.com>
* config/i386/i386.c (x86_print_call_or_nop): New function.
(x86_function_profiler): Support -mnop-mcount and
-mrecord-mcount.
* config/i386/i386.opt (-mnop-mcount, -mrecord-mcount): Add
* doc/invoke.texi: Document -mnop-mcount, -mrecord-mcount.
gcc/testsuite:
2014-09-25 Andi Kleen <ak@linux.intel.com>
* gcc.target/i386/nop-mcount.c: New file.
* gcc.target/i386/record-mcount.c: New file.
Jan Hubicka [Thu, 25 Sep 2014 18:57:44 +0000 (20:57 +0200)]
ipa-devirt.c (polymorphic_call_target_d): Add SPECULATIVE; reorder for better storage.
* ipa-devirt.c (polymorphic_call_target_d): Add SPECULATIVE; reorder
for better storage.
(polymorphic_call_target_hasher::hash): Hash SPECULATIVE.
(possible_polymorphic_call_targets): Instead of computing both
speculative and non-speculative answers, do just one at a time.
Replace NONSPECULATIVE_TARGETSP parameter with SPECULATIVE flag.
(dump_targets): Break out from ...
(dump_possible_polymorphic_call_targets): ... here; dump both speculative
and non-speculative lists.
(ipa_devirt): Update for new possible_polymorphic_call_targets API.
* ipa-utils.h (possible_polymorphic_call_targets): Update.
Jiong Wang [Thu, 25 Sep 2014 16:39:49 +0000 (16:39 +0000)]
Improve live-in calculation for splitted block
gcc/
* shrink-wrap.c (move_insn_for_shrink_wrap): Initialize the live-in of new
created BB as the intersection of live-in from "old_dest" and live-out from
"bb".
gcc/testsuite/
* gcc.target/i386/shrink_wrap_1.c: New test.
Jakub Jelinek [Thu, 25 Sep 2014 08:12:49 +0000 (10:12 +0200)]
re PR tree-optimization/63341 (Vectorization miscompilation with -mcpu=power7)
PR tree-optimization/63341
* tree-vectorizer.h (vect_create_data_ref_ptr,
vect_create_addr_base_for_vector_ref): Add another tree argument
defaulting to NULL_TREE.
* tree-vect-data-refs.c (vect_create_data_ref_ptr): Add byte_offset
argument, pass it down to vect_create_addr_base_for_vector_ref.
(vect_create_addr_base_for_vector_ref): Add byte_offset argument,
add that to base_offset too if non-NULL.
* tree-vect-stmts.c (vectorizable_load): Add byte_offset variable,
for dr_explicit_realign_optimized set it to vector byte size
- 1 instead of setting offset, pass byte_offset down to
vect_create_data_ref_ptr.
* gcc.dg/vect/pr63341-1.c: New test.
* gcc.dg/vect/pr63341-2.c: New test.
Andreas Krebbel [Thu, 25 Sep 2014 07:37:36 +0000 (07:37 +0000)]
[multiple changes]
2014-09-25 Andreas Arnez <arnez@linux.vnet.ibm.com>
PR 63300/debug
* tree.c (check_base_type): New.
(check_qualified_type): Exploit new helper function above.
* tree.h (check_base_type): New prototype.
* dwarf2out.c (get_nearest_type_subqualifiers): New.
(modified_type_die): Fix handling for qualifiers. Qualifiers to
"peel off" are now determined using get_nearest_type_subqualifiers.
2014-09-25 Mark Wielaard <mjw@redhat.com>
PR 63300/debug
* gcc.dg/debug/dwarf2/stacked-qualified-types-1.c: New testcase.
* gcc.dg/debug/dwarf2/stacked-qualified-types-2.c: Likewise.
* gcc.dg/guality/pr63300-const-volatile.c: New testcase.
Jan Hubicka [Thu, 25 Sep 2014 01:48:34 +0000 (03:48 +0200)]
cgraph.h (class ipa_polymorphic_call_context): Move here from ipa-utils.h; add stream_int and stream_out methods.
* cgraph.h (class ipa_polymorphic_call_context): Move here from
ipa-utils.h; add stream_int and stream_out methods.
(cgraph_indirect_call_info): Remove SPECILATIVE_OFFSET,
OUTER_TYPE, SPECULATIVE_OUTER_TYPE, MAYBE_IN_CONSTRUCTION
MAYBE_DERIVED_TYPE and SPECULATIEVE_MAYBE_DERIVED_TYPE;
add CONTEXT.
(ipa_polymorphic_call_context::ipa_polymorphic_call_context,
ipa_polymorphic_call_context::ipa_polymorphic_call_context,
ipa_polymorphic_call_context::clear_speculation,
ipa_polymorphic_call_context::clear_outer_type): Move here from
ipa-utils.h
* ipa-utils.h (class ipa_polymorphic_call_context): Move to cgraph.h
(ipa_polymorphic_call_context::ipa_polymorphic_call_context,
ipa_polymorphic_call_context::ipa_polymorphic_call_context,
ipa_polymorphic_call_context::clear_speculation,
ipa_polymorphic_call_context::clear_outer_type): Likewise.
* ipa-devirt.c: Include data-streamer.h, lto-streamer.h and
streamer-hooks.h
(ipa_polymorphic_call_context::stream_out): New method.
(ipa_polymorphic_call_context::stream_in): New method.
(noncall_stmt_may_be_vtbl_ptr_store): Add forgotten static.
* ipa-prop.c (ipa_analyze_indirect_call_uses): Do not care about
OUTER_TYPE.
(ipa_analyze_call_uses): Simplify.
(update_indirect_edges_after_inlining): Do not care about outer_type.
(ipa_write_indirect_edge_info): Update.
(ipa_write_indirect_edge_info): Likewise.
* cgraph.c (cgraph_node::create_indirect_edge): Simplify.
(dump_edge_flags): Break out from ...
(cgraph_node::dump): ... here; dump indirect edges.
Jan Hubicka [Wed, 24 Sep 2014 20:30:21 +0000 (22:30 +0200)]
ipa-utils.h (polymorphic_call_context): Add metdhos dump, debug and clear_outer_type.
* ipa-utils.h (polymorphic_call_context): Add
metdhos dump, debug and clear_outer_type.
(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Simplify.
(ipa_polymorphic_call_context::clear_outer_type): New method.
* ipa-prop.c (ipa_analyze_call_uses): Do not overwrite offset.
* ipa-devirt.c (types_odr_comparable): New function.
(types_must_be_same_for_odr): New function.
(odr_subtypes_equivalent_p): Simplify.
(possible_placement_new): Break out from ...
(ipa_polymorphic_call_context::restrict_to_inner_type): ... here;
be more cuatious about returning false in cases the context may be
valid in derived type or via placement new.
(contains_type_p): Clear maybe_derived_type
(ipa_polymorphic_call_context::dump): New method.
(ipa_polymorphic_call_context::debug): New method.
(ipa_polymorphic_call_context::set_by_decl): Cleanup comment.
(ipa_polymorphic_call_context::set_by_invariant): Simplify.
(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Simplify.
(possible_polymorphic_call_targets): Trust context.restrict_to_inner_class
to suceed on all valid cases; remove confused sanity check.
(dump_possible_polymorphic_call_targets): Simplify.
[AArch64] Use __aarch64_vget_lane* macros for getting the lane in some lane multiply intrinsics.
* config/aarch64/arm_neon.h (vmuld_lane_f64): Use macro for getting
the lane.
(vmuld_laneq_f64): Likewise.
(vmuls_lane_f32): Likewise.
(vmuls_laneq_f32): Likewise.
* gcc.target/aarch64/simd/vmul_lane_const_lane_1.c: New test.
re PR tree-optimization/63266 (Test regression: gcc.target/sh/pr53568-1.c)
2014-09-24 Thomas Preud'homme <thomas.preudhomme@arm.com>
gcc/
PR tree-optimization/63266
* tree-ssa-math-opts.c (struct symbolic_number): Add comment about
marker for unknown byte value.
(MARKER_MASK): New macro.
(MARKER_BYTE_UNKNOWN): New macro.
(HEAD_MARKER): New macro.
(do_shift_rotate): Mark bytes with unknown values due to sign
extension when doing an arithmetic right shift. Replace hardcoded
mask for marker by new MARKER_MASK macro.
(find_bswap_or_nop_1): Likewise and adjust ORing of two symbolic
numbers accordingly.
gcc/testsuite/
PR tree-optimization/63266
* gcc.dg/optimize-bswapsi-1.c (swap32_d): New bswap pass test.
Some projects need to prevent reordering of specific top level
declarations with LTO, in particular declarations defining init calls.
The only way to do that with LTO was to use -fno-toplevel-reorder,
which stops reordering for all declarations and makes LTO partitioning
less efficient.
This patch adds a new no_reorder attribute that stops reordering only
for the marked declaration. The program can then only mark e.g. the
initcalls and leave all the other declarations alone.
The patch does:
- Adds the new no_reorder attribute for the C family.
- Initializes a new no_reorder flag in the symtab_nodes in the
function visibility flag.
- Maintains the no_reorder flag when creating new nodes.
- Changes the partition code to always keep a separate
sorted queue of ordered nodes and flush them in order with the other
nodes. This is used by all nodes with -fno-toplevel-reorder,
and only the marked ones without it.
Parts of the old -fno-toplevel-reorder code paths are reused.
- Adds various checks throughout the tree to make no_reorder
marked functions behave the same as with -fno-toplevel-reorder
- Changes the LTO streamer to serialize the no_reorder attribute.
gcc/c-family/:
2014-09-23 Andi Kleen <ak@linux.intel.com>
* c-common.c (handle_no_reorder_attribute): New function.
(c_common_attribute_table): Add no_reorder attribute.
* lto-partition.c (node_cmp): Update comment.
(varpool_node_cmp): Use symtab_node for comparison.
(add_sorted_nodes): New function.
(lto_balanced_map): Change to keep ordered queue
of ordered node. Handle no_reorder attribute.
gcc/testsuite/:
* gcc.dg/combine_ashiftrt_1.c: New test.
* gcc.dg/combine_ashiftrt_2.c: Likewise.
* gcc.target/aarch64/singleton_intrinsics_1.c: Remove scan-assembler
workarounds for cmge.
* gcc.target/aarch64/simd/int_comparisons_1.c: Likewise; also check for
absence of mvn.
Michael Meissner [Tue, 23 Sep 2014 17:11:07 +0000 (17:11 +0000)]
rs6000.md (f32_vsx): New mode attributes to refine the constraints used on 32/64-bit floating point...
2014-09-23 Michael Meissner <meissner@linux.vnet.ibm.com>
* config/rs6000/rs6000.md (f32_vsx): New mode attributes to
refine the constraints used on 32/64-bit floating point moves.
(f32_av): Likewise.
(f64_vsx): Likewise.
(f64_dm): Likewise.
(f64_av): Likewise.
(BOOL_REGS_OUTPUT): Use wt constraint for TImode instead of wa.
(BOOL_REGS_OP1): Likewise.
(BOOL_REGS_OP2): Likewise.
(BOOL_REGS_UNARY): Likewise.
(mov<mode>_hardfloat, SFmode/SDmode): Tighten down constraints for
32/64-bit floating point moves. Do not use wa, instead use ww/ws
for moves involving VSX registers. Do not use constraints that
target VSX registers for decimal types.
(mov<mode>_hardfloat32, DFmode/DDmode): Likewise.
(mov<mode>_hardfloat64, DFmode/DDmode): Likewise.
Mark Wielaard [Tue, 23 Sep 2014 11:07:08 +0000 (11:07 +0000)]
Make all gcc.dg/guality/const-volatile.c subtests PASS under LTO.
Some subtests were reported as UNSUPPORTED when running under LTO.
That was just because the relevant variables were optimized out.
Mark those variables as used. Now const-volatile reports 192 PASS.
gcc/testsuite/ChangeLog
* gcc.dg/guality/const-volatile.c (i): Mark as used.
(ci): Likewise.
(pci): Likewise.
(pvi): Likewise.
(pcvi): Likewise.
(cip): Likewise.
(foo): Likewise.
(cfoo): Likewise.
Mark Wielaard [Tue, 23 Sep 2014 11:06:57 +0000 (11:06 +0000)]
gcc-gdb-test.exp: Handle old GDB "short int" and "long int" types.
Old GDB might show short and long as short int and long int. This made
gcc.dg/guality/const-volatile.c ans restrict.c fail on older GDBs.
According to the patch that changed this in newer versions of GDB
this was a bug: https://sourceware.org/ml/gdb-patches/2012-09/msg00455.html
The patch transforms the types "short int" and "long int" coming from
GDB to plain "short" and "long". And a variant has been added to the
const-volatile.c testcase to make sure short and long long are handled
correctly now with older GDB.
gcc/testsuite/ChangeLog
* lib/gcc-gdb-test.exp (gdb-test): Transform gdb types "short int"
and "long int" to plain "short" and "long".
* gcc.dg/guality/const-volatile.c (struct bar): New struct
containing short and long long fields.
(bar): New variable to test the type.
This patch removes the target macro LIBGCC2_LONG_DOUBLE_TYPE_SIZE.
After recent changes, this macro was used in two ways in libgcc: to
determine the mode of long double in dfp-bit.h, and to determine
whether a particular mode has excess precision for use in complex
multiplication.
The former is concerned specifically with long double: it relates to
use of strtold for converting between decimal and binary floating
point. This is replaced by comparing __LDBL_MANT_DIG__ with the
appropriate __LIBGCC_*_MANT_DIG__ macro. The latter is replaced
__LIBGCC_*_EXCESS_PRECISION__ predefined macros.
Remarks:
* Comparing (__LDBL_MANT_DIG__ == __LIBGCC_XF_MANT_DIG__) is more
fragile than it looks; it's possible for XFmode to have 53-bit
mantissa (TARGET_96_ROUND_53_LONG_DOUBLE, on FreeBSD and
DragonFlyBSD 32-bit), in which case such a comparison would not
distinguish XFmode and DFmode as possible modes for long double.
Fortunately, no target supporting that form of XFmode also supports
long double = double (but if some target did, we'd need e.g. an
additional macro giving the exponent range of each mode).
Furthermore, this code doesn't actually get used for x86 (or any
other target with XFmode support), because x86 uses BID not DPD and
BID has its own conversion code (which handles conversions for both
XFmode and TFmode without needing to go via strtold). And FreeBSD
and DragonFlyBSD aren't among the targets with DFP support. So
while in principle this code is fragile and it's a deficiency that
it can't support both XFmode and TFmode at once (something that
can't be solved with the string conversion approach without libc
having TS 18661 functions such as strtof128), all these issues
should not be a problem in practice.
* If other cases of excess precision are supported in future, the code
for defining __LIBGCC_*_EXCESS_PRECISION__ may need updating.
Although the most likely such cases might not actually involve
excess precision for any mode used in libgcc - FLT_EVAL_METHOD being
32 to do _Float16 arithmetic on _Float32 should have the effect of
_Complex _Float16 arithmetic using __mulsc3 and __divsc3, rather
than currently nonexistent __mulhc3 and __divhc3 as in bug 63250 for
ARM.
* As has been noted in the context of simultaneous support for
__float128 and __ibm128 on Power, the semantics of macros such as
LONG_DOUBLE_TYPE_SIZE are problematic because they rely on a
poorly-defined precision value for floating-point modes (which seems
to be intended as the number of significant bits in the
representation, e.g. 80 for XFmode which may be either 12 or 16
bytes) uniquely identifying a mode (although defining an arbitrarily
different value for one of the modes you wish to distinguish may
work as a hack). It would be cleaner to have a target hook that
gives a machine mode directly for float, double and long double,
rather than going via these precision values. By eliminating all
use of these macros (FLOAT_TYPE_SIZE, DOUBLE_TYPE_SIZE,
LONG_DOUBLE_TYPE_SIZE) from code built for the target, this patch
facilitates such a conversion to a hook (which I suppose would take
some suitable enum as an argument to identify which of the three
types to return a mode for).
(The issue of multiple type support for DFP conversions would apply
in that Power case.
<https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01084.html> doesn't
seem to touch on it, but it would seem reasonable to punt on it
initially as hard to fix. There would also be the issue of getting
functions such as __powikf2, __mulkc3, __divkc3 defined, but that's
rather easier to address.)
Bootstrapped with no regressions on x86_64-unknown-linux-gnu.
libgcc:
* dfp-bit.h (LIBGCC2_LONG_DOUBLE_TYPE_SIZE): Remove.
(__LIBGCC_XF_MANT_DIG__): Define if not already defined.
(LONG_DOUBLE_HAS_XF_MODE): Define in terms of
__LIBGCC_XF_MANT_DIG__.
(__LIBGCC_TF_MANT_DIG__): Define if not already defined.
(LONG_DOUBLE_HAS_TF_MODE): Define in terms of
__LIBGCC_TF_MANT_DIG__.
* libgcc2.c (NOTRUNC): Define in terms of
__LIBGCC_*_EXCESS_PRECISION__, not LIBGCC2_LONG_DOUBLE_TYPE_SIZE.
* libgcc2.h (LIBGCC2_LONG_DOUBLE_TYPE_SIZE): Remove.
Ian Lance Taylor [Mon, 22 Sep 2014 21:14:43 +0000 (21:14 +0000)]
runtime: Mark runtime_goexit function as noinline.
If the compiler inlines this function into kickoff, it may reuse
the TLS block address to load g. However, this is not necessarily
correct, as the call to g->entry in kickoff may cause the TLS
address to change. If the wrong value is loaded for g->status in
runtime_goexit, it may cause a runtime panic.
By marking the function as noinline we prevent the compiler from
reusing the TLS address.
Jan Hubicka [Mon, 22 Sep 2014 19:43:02 +0000 (21:43 +0200)]
charset.c (conversion): Rename to ...
* charset.c (conversion): Rename to ...
(cpp_conversion): ... this one; update.
* files.c (file_hash_entry): Rename to ...
(cpp_file_hash_entry): ... this one ; update.