Previously, we use 16:11:8 in generic tune for Intel processors, which
lead to cross cache line issue and result in some random performance
penalty in benchmarks with small loops commit to commit.
After changing to always aligning to 16 bytes, it will somehow solve
the issue.
gcc/ChangeLog:
* config/i386/x86-tune-costs.h (generic_cost): Change from
16:11:8 to 16.
generic_memset,
COSTS_N_INSNS (4), /* cond_taken_branch_cost. */
COSTS_N_INSNS (2), /* cond_not_taken_branch_cost. */
- "16:11:8", /* Loop alignment. */
+ "16", /* Loop alignment. */
"16:11:8", /* Jump alignment. */
"0:0:8", /* Label alignment. */
"16", /* Func alignment. */