]> git.ipfire.org Git - thirdparty/gcc.git/commitdiff
i386: Refine c86-4g fdiv scheduling model
authorKewen Lin <linkewen@hygon.cn>
Thu, 28 May 2026 11:22:57 +0000 (11:22 +0000)
committerKewen Lin <linkw@gcc.gnu.org>
Thu, 28 May 2026 11:22:57 +0000 (11:22 +0000)
Commit r17-258 introduced separated c86-4g fdiv units to avoid the
automaton explosion caused by modeling the whole divider latency on
normal FPU pipes.  But the real hardware may keep the associated FPU
pipe occupied for some cycles at both the beginning and the end of
an fdiv or sqrt operation.  Following Alexander's suggestion in [1],
this patch still keeps the long-latency part on the dedicated fdiv
unit but models only a bounded part of the FPU pipe occupancy.  It
makes the first four cycles reserve both the selected FPU pipe and
the fdiv unit, then keep only the fdiv unit for the remaining cycles.

Taking r17-258 as baseline, I tried K = 1,2,3,4 for

  fpu,divider*N -> (fpu+divider)*K, divider*(N-K)

and measured the time for build/genautomata and the top 100 symbol
sizes of insn-automata.o (baseline normalized as 100) as below:

1) without any other changes:
              time     size
  baseline    100      100
  r17-203     340.0    629.3
  K1          100.3    100
  K2          105.5    112.5
  K3          112.8    129
  K4          119.4    141

2) Splitting fpu0/fpu2 and fpu1/fpu3 to paired automatons:
              time     size
  baseline    100      100
  r17-203     340.0    629.3
  KS1         79.6     43.3
  KS2         79.8     43.3
  KS3         79.6     43.3
  KS4         79.4     43.3

It turns out that if we want to model the FPU occupancy for some
beginning cycles, separating the involved fpu1/fpu3 from the
original fpu looks better.  So this patch splits fpu0/fpu2 and
fpu1/fpu3 into two paired automata and this extra coupling does
not grow the main FPU automata significantly.

This patch also corrects some other modeling omissions like:

  - Fix c86_4g_fp_op_idiv_load latency typo by one cycle.
  - Merge the old c86_4g_m7 idiv DI/SI/HI reservations after
    aligning their latency and divider unit occupancy (with
    updated values), while keeping QI separate.
  - Adjust reservation units in templates like
    c86_4g_m7_avx_vpinsr_reg_load and c86_4g_m7_avx512_sseadd_xy
    etc.
  - Add missing reservation units and unit occupancy in templates
    like c86_4g_m7_avx512_permi2_ymm and
    c86_4g_m7_sse_sseiadd_hplus_load etc.
  - Adjust reservation units and unit occupancy in templates like
    c86_4g_m7_avx512_perm_zmm_imm, c86_4g_m7_avx512_expand and
    c86_4g_m7_avx512_ssemul etc.

And also introduces some reusable reservation aliases to simplify
some modelings.

I tested build time for i686 bootstrapping in a docker container:
  - r17-202: 2437s (before c86-4g support)
  - r17-203: 7291s (c86-4g support)
  - r17-258: 2646s (tweaking for build time)
  - this: 2358s
It looks this patch improves build time (even better than r17-202
though the trivial gap can be due to some jitter).

The symbol sizes are improved as below:

nm -CS -t d --defined-only gcc/insn-automata.o \
    | sed 's/^[0-9]* 0*//' \
    | sort -n | tail -20

with r17-258:

  20068 r bdver1_fp_transitions
  22354 r c86_4g_m7_ieu_min_issue_delay
  26208 r slm_min_issue_delay
  26580 t internal_min_issue_delay(int, DFA_chip*)
  26869 t internal_state_transition(int, DFA_chip*)
  27244 r bdver1_fp_min_issue_delay
  28518 r glm_check
  28518 r glm_transitions
  33690 r geode_min_issue_delay
  33728 r c86_4g_fp_transitions
  45436 r znver4_fpu_min_issue_delay
  46980 r bdver3_fp_min_issue_delay
  49428 r glm_min_issue_delay
  53730 r btver2_fp_min_issue_delay
  53760 r znver1_fp_transitions
  89414 r c86_4g_m7_ieu_transitions
  93960 r bdver3_fp_transitions
  181744 r znver4_fpu_transitions
  326322 r c86_4g_m7_fpu_min_issue_delay
  1305288 r c86_4g_m7_fpu_transitions

with this:

  17872 r print_reservation(_IO_FILE*, rtx_insn*)::...
  20068 r bdver1_fp_check
  20068 r bdver1_fp_transitions
  22016 r c86_4g_m7_fpu02_transitions
  22354 r c86_4g_m7_ieu_min_issue_delay
  26208 r slm_min_issue_delay
  27244 r bdver1_fp_min_issue_delay
  28199 t internal_min_issue_delay(int, DFA_chip*)
  28362 t internal_state_transition(int, DFA_chip*)
  28518 r glm_check
  28518 r glm_transitions
  33690 r geode_min_issue_delay
  45436 r znver4_fpu_min_issue_delay
  46980 r bdver3_fp_min_issue_delay
  49428 r glm_min_issue_delay
  53730 r btver2_fp_min_issue_delay
  53760 r znver1_fp_transitions
  89414 r c86_4g_m7_ieu_transitions
  93960 r bdver3_fp_transitions
  181744 r znver4_fpu_transitions

Based on random sampling of SPEC2017 benchmarks 525.x264_r and
521.wrf_r, I verified that the new modeling introduces no
significant compilation overhead.  Testing with a single job on a
c86-4g-m7 machine revealed no impact on x264 and a tiny increase
for wrf (~0.3%).

[1] https://gcc.gnu.org/pipermail/gcc-patches/2026-May/716681.html

gcc/ChangeLog:

* config/i386/c86-4g-m7.md (c86_4g_m7_fpu): Remove automaton.
(c86_4g_m7_fpu02): New automaton.
(c86_4g_m7_fpu13): Ditto.
(c86-4g-m7-fpu0): Move to c86_4g_m7_fpu02 automaton.
(c86-4g-m7-fpu1): Move to c86_4g_m7_fpu13 automaton.
(c86-4g-m7-fpu2): Move to c86_4g_m7_fpu02 automaton.
(c86-4g-m7-fpu3): Move to c86_4g_m7_fpu13 automaton.
(c86-4g-m7-fdiv): Remove cpu unit.
(c86-4g-m7-fdiv1): New cpu unit.
(c86-4g-m7-fdiv3): Ditto.
(c86-4g-m7-fpu_0_3): New reservation.
(c86-4g-m7-fpu_1_3x2): Ditto.
(c86-4g-m7-fpu_1_3x3): Ditto.
(c86-4g-m7-fpu_1_3x6): Ditto.
(c86-4g-m7-fpux2): Ditto.
(c86-4g-m7-fpux4): Ditto.
(c86-4g-m7-fpux6): Ditto.
(c86-4g-m7-fpux8): Ditto.
(c86-4g-m7-fpux16): Ditto.
(c86-4g-m7-fp1fdiv1x4): Ditto.
(c86-4g-m7-fp3fdiv3x4): Ditto.
(c86-4g-m7-fdiv13): Ditto.
(c86-4g-m7-fp13div13): Ditto.
(c86-4g-m7-fp13div13x4): Ditto.
(c86-4g-m7-fp1div1_fp3div3_x4x8): Ditto.
(c86-4g-m7-fp1div1_fp3div3_x4x9): Ditto.
(c86-4g-m7-fp1div1_fp3div3_x4x11): Ditto.
(c86-4g-m7-fp1div1_fp3div3_x4x15): Ditto.
(c86-4g-m7-fp1div1_fp3div3_x4x18): Ditto.
(c86_4g_m7_idiv): New reservation.
(c86_4g_m7_idiv_QI): Adjust reservation latency and unit occupancy.
(c86_4g_m7_idiv_load): New reservation.
(c86_4g_m7_idiv_QI_load): Adjust reservation latency and unit
occupancy.
(c86_4g_m7_idiv_DI): Remove reservation.
(c86_4g_m7_idiv_SI): Ditto.
(c86_4g_m7_idiv_HI): Ditto.
(c86_4g_m7_idiv_DI_load): Ditto.
(c86_4g_m7_idiv_SI_load): Ditto.
(c86_4g_m7_idiv_HI_load): Ditto.
(c86_4g_m7_sse_insertimm): Adjust reservation units and unit
occupancy.
(c86_4g_m7_sse_insert): Ditto.
(c86_4g_m7_fp_sqrt): Adjust reservation.
(c86_4g_m7_fp_div): Ditto.
(c86_4g_m7_fp_div_load): Ditto.
(c86_4g_m7_fp_idiv_load): Ditto.
(c86_4g_m7_sse_pinsr_reg): Adjust reservation units and unit
occupancy.
(c86_4g_m7_sse_pinsr_reg_load): Ditto.
(c86_4g_m7_avx_vpinsr_reg): Ditto.
(c86_4g_m7_avx_vpinsr_reg_load): Ditto.
(c86_4g_m7_avx512_perm_xmm): Delete the prefix condition.
(c86_4g_m7_avx512_perm_xmm_opload): Ditto.
(c86_4g_m7_avx512_permi2_ymm): Adjust reservation units and unit
occupancy.
(c86_4g_m7_avx512_permi2_zmm): Ditto.
(c86_4g_m7_avx512_permi2_ymm_load): Ditto.
(c86_4g_m7_avx512_permi2_zmm_load): Ditto.
(c86_4g_m7_avx512_perm_zmm_imm): Ditto.
(c86_4g_m7_avx512_perm_zmm_imm_load): Ditto.
(c86_4g_m7_avx512_perm_zmm_noimm): Ditto.
(c86_4g_m7_sse_perm_zmm_noimm_load): Ditto.
(c86_4g_m7_avx_perm_ymm): Remove.
(c86_4g_m7_avx_perm_ymem): Ditto.
(c86_4g_m7_avx512_shuf_zmm): Adjust reservation units and unit
occupancy.
(c86_4g_m7_avx512_shuf_zmem): Ditto.
(c86_4g_m7_avx512_cmpestr): Ditto.
(c86_4g_m7_avx512_cmpestr_load): Ditto.
(c86_4g_m7_avx512_vdbpsadbw_zmm): Ditto.
(c86_4g_m7_avx512_vdbpsadbw_zmem): Ditto.
(c86_4g_m7_avx_ssecomi_comi): Ditto.
(c86_4g_m7_avx_ssecomi_comi_load): Ditto.
(c86_4g_m7_avx512_expand): Ditto.
(c86_4g_m7_avx512_expand_load): Ditto.
(c86_4g_m7_avx512_expand_z): Ditto.
(c86_4g_m7_avx512_expand_z_load): Ditto.
(c86_4g_m7_sse_movnt_xy): Rename to c86_4g_m7_sse_movnt.
(c86_4g_m7_avx512_sseadd_xy): Adjust reservation units.
(c86_4g_m7_avx512_sseadd_xy_load): Ditto.
(c86_4g_m7_sse_sseiadd_hplus): Adjust reservation units and unit
occupancy.
(c86_4g_m7_sse_sseiadd_hplus_load): Ditto.
(c86_4g_m7_avx512_ssemul): Adjust reservation units.
(c86_4g_m7_avx512_ssemul_load): Ditto.
(c86_4g_m7_avx512_ssediv): Remove.
(c86_4g_m7_avx512_ssediv_mem): Remove.
(c86_4g_m7_avx512_ssediv_x): New.
(c86_4g_m7_avx512_ssediv_xmem): New.
(c86_4g_m7_avx512_ssediv_y): New.
(c86_4g_m7_avx512_ssediv_ymem): New.
(c86_4g_m7_avx512_ssediv_z): Adjust reservation units.
(c86_4g_m7_avx512_ssediv_zmem): Ditto.
(c86_4g_m7_avx512_ssecmp_z): Add reservation units and unit
occupancy.
(c86_4g_m7_avx512_ssecmp_z_load): Ditto.
(c86_4g_m7_avx512_ssecmp_vp_z): New reservation.
(c86_4g_m7_avx512_ssecmp_vp_z_load): Ditto.
(c86_4g_m7_avx512_ssecmp_test_z): Remove reservation.
(c86_4g_m7_avx512_ssecmp_test_z_load): Ditto.
(c86_4g_m7_avx512_muladd): Broaden matching condition.
(c86_4g_m7_avx512_muladd_load): Ditto.
(c86_4g_m7_fma_muladd): Remove reservation.
(c86_4g_m7_fma_muladd_load): Ditto.
(c86_4g_m7_avx512_sse_conflict_x): Add reservation units and unit
occupancy.
(c86_4g_m7_avx512_sse_conflict_x_load): Ditto.
(c86_4g_m7_avx512_sse_conflict_y): Ditto.
(c86_4g_m7_avx512_sse_conflict_y_load): Ditto.
(c86_4g_m7_avx512_sse_conflict_z): Ditto.
(c86_4g_m7_avx512_sse_conflict_z_load): Ditto.
(c86_4g_m7_avx512_sse_class_z): Add reservation units and unit
occupancy.
(c86_4g_m7_avx512_sse_class_z_load): Ditto.
(c86_4g_m7_avx512_sse_sqrt): Remove.
(c86_4g_m7_avx512_sse_sqrt_load): Remove.
(c86_4g_m7_avx512_sse_sqrt_sf_x): New.
(c86_4g_m7_avx512_sse_sqrt_sf_xload): New.
(c86_4g_m7_avx512_sse_sqrt_sf_y): New.
(c86_4g_m7_avx512_sse_sqrt_sf_yload): New.
(c86_4g_m7_avx512_sse_sqrt_sf_z): New.
(c86_4g_m7_avx512_sse_sqrt_sf_zload): New.
(c86_4g_m7_avx512_sse_sqrt_df_x): New.
(c86_4g_m7_avx512_sse_sqrt_df_xload): New.
(c86_4g_m7_avx512_sse_sqrt_df_y): New.
(c86_4g_m7_avx512_sse_sqrt_df_yload): New.
(c86_4g_m7_avx512_sse_sqrt_df_z): New.
(c86_4g_m7_avx512_sse_sqrt_df_zload): New.
(c86_4g_m7_avx512_msklog_vector): Add reservation units and unit
occupancy.
(c86_4g_m7_avx512_mskmov_z_k): Ditto.
(c86_4g_m7_avx512_mskmov_k_reg): Ditto.
* config/i386/c86-4g.md (c86_4g_fp): Remove automaton.
(c86_4g_fp024): New automaton.
(c86_4g_fp1): Ditto.
(c86-4g-fp0): Move to c86_4g_fp024 automaton.
(c86-4g-fp1): Move to c86_4g_fp1 automaton.
(c86-4g-fp2): Move to c86_4g_fp024 automaton.
(c86-4g-fp3): Ditto.
(c86-4g-fp1fdivx4): New reservation.
(c86_4g_fp_sqrt): Adjust reservation.
(c86_4g_sse_sqrt_sf): Ditto.
(c86_4g_sse_sqrt_sf_mem): Ditto.
(c86_4g_sse_sqrt_df): Ditto.
(c86_4g_sse_sqrt_df_mem): Ditto.
(c86_4g_fp_op_div): Ditto.
(c86_4g_fp_op_div_load): Ditto.
(c86_4g_fp_op_idiv_load): Adjust reservation latency.
(c86_4g_ssediv_ss_ps): Adjust reservation.
(c86_4g_ssediv_ss_ps_load): Ditto.
(c86_4g_ssediv_sd_pd): Ditto.
(c86_4g_ssediv_sd_pd_load): Ditto.
(c86_4g_ssediv_avx256_ps): Ditto.
(c86_4g_ssediv_avx256_ps_load): Ditto.
(c86_4g_ssediv_avx256_pd): Ditto.
(c86_4g_ssediv_avx256_pd_load): Ditto.

Co-authored-by: Xin Liu <liulxx@hygon.cn>
Signed-off-by: Xin Liu <liulxx@hygon.cn>
Signed-off-by: Kewen Lin <linkewen@hygon.cn>
gcc/config/i386/c86-4g-m7.md
gcc/config/i386/c86-4g.md

index 54a850db3be84d84dfe661f66f7f2932ef67384f..96bd322a28833d1b585de5174027066ca1771771 100644 (file)
 ;; HYGON c86-4g-m7 Scheduling
 ;; Modeling automatons for decoders, integer execution pipes,
 ;; AGU pipes, branch, floating point execution, fp store units,
-;; integer and floating point dividers.
-(define_automaton "c86_4g_m7, c86_4g_m7_ieu, c86_4g_m7_agu, c86_4g_m7_fpu, c86_4g_m7_idiv, c86_4g_m7_fdiv")
+;; integer and floating point dividers.  Split fpu1 and fpu3
+;; into their own automata to keep these units independent
+;; without increasing the main c86_4g_m7_fpu state space.
+(define_automaton "c86_4g_m7, c86_4g_m7_ieu, c86_4g_m7_agu, c86_4g_m7_fpu02, c86_4g_m7_fpu13, c86_4g_m7_idiv, c86_4g_m7_fdiv")
 
 ;; Decoders unit has 4 decoders and all of them can decode fast path
 ;; and vector type instructions.
 (define_cpu_unit "c86-4g-m7-decode2" "c86_4g_m7")
 (define_cpu_unit "c86-4g-m7-decode3" "c86_4g_m7")
 
-;; Two separated dividers for int and fp.
-(define_cpu_unit "c86-4g-m7-idiv" "c86_4g_m7_idiv")
-(define_cpu_unit "c86-4g-m7-fdiv" "c86_4g_m7_fdiv")
-
 ;; Currently blocking all decoders for vector path instructions as
 ;; they are dispatched separetely as microcode sequence.
 (define_reservation "c86-4g-m7-vector" "c86-4g-m7-decode0+c86-4g-m7-decode1+c86-4g-m7-decode2+c86-4g-m7-decode3")
@@ -50,6 +48,9 @@
 (define_cpu_unit "c86-4g-m7-ieu2" "c86_4g_m7_ieu")
 (define_cpu_unit "c86-4g-m7-ieu3" "c86_4g_m7_ieu")
 
+;; One separated integer divider.
+(define_cpu_unit "c86-4g-m7-idiv" "c86_4g_m7_idiv")
+
 ;; c86-4g-m7 has an additional branch unit.
 (define_cpu_unit "c86-4g-m7-bru0" "c86_4g_m7_ieu")
 (define_reservation "c86-4g-m7-ieu" "c86-4g-m7-ieu0|c86-4g-m7-ieu1|c86-4g-m7-ieu2|c86-4g-m7-ieu3")
 ;; vectorpath (microcoded) instructions are single issue instructions.
 ;; So, they occupy all the integer units.
 (define_reservation "c86-4g-m7-ivector" "c86-4g-m7-ieu0+c86-4g-m7-ieu1
-                                     +c86-4g-m7-ieu2+c86-4g-m7-ieu3+c86-4g-m7-bru0
-                                     +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2")
+                                        +c86-4g-m7-ieu2+c86-4g-m7-ieu3+c86-4g-m7-bru0
+                                        +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2")
 
 ;; Floating point unit 4 FP pipes.
-(define_cpu_unit "c86-4g-m7-fpu0" "c86_4g_m7_fpu")
-(define_cpu_unit "c86-4g-m7-fpu1" "c86_4g_m7_fpu")
-(define_cpu_unit "c86-4g-m7-fpu2" "c86_4g_m7_fpu")
-(define_cpu_unit "c86-4g-m7-fpu3" "c86_4g_m7_fpu")
+(define_cpu_unit "c86-4g-m7-fpu0" "c86_4g_m7_fpu02")
+(define_cpu_unit "c86-4g-m7-fpu1" "c86_4g_m7_fpu13")
+(define_cpu_unit "c86-4g-m7-fpu2" "c86_4g_m7_fpu02")
+(define_cpu_unit "c86-4g-m7-fpu3" "c86_4g_m7_fpu13")
+
 (define_reservation "c86-4g-m7-fpu" "c86-4g-m7-fpu0|c86-4g-m7-fpu1|c86-4g-m7-fpu2|c86-4g-m7-fpu3")
-(define_reservation "c86-4g-m7-fpu_0_2" "c86-4g-m7-fpu0|c86-4g-m7-fpu2")
-(define_reservation "c86-4g-m7-fpu_1_3" "c86-4g-m7-fpu1|c86-4g-m7-fpu3")
 (define_reservation "c86-4g-m7-fpu_0_1" "c86-4g-m7-fpu0|c86-4g-m7-fpu1")
+(define_reservation "c86-4g-m7-fpu_0_2" "c86-4g-m7-fpu0|c86-4g-m7-fpu2")
 (define_reservation "c86-4g-m7-fpu_0_2x2" "c86-4g-m7-fpu0*2|c86-4g-m7-fpu2*2")
 (define_reservation "c86-4g-m7-fpu_0_2x4" "c86-4g-m7-fpu0*4|c86-4g-m7-fpu2*4")
+(define_reservation "c86-4g-m7-fpu_0_3" "c86-4g-m7-fpu0|c86-4g-m7-fpu3")
+(define_reservation "c86-4g-m7-fpu_1_3" "c86-4g-m7-fpu1|c86-4g-m7-fpu3")
+(define_reservation "c86-4g-m7-fpu_1_3x2" "c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2")
+(define_reservation "c86-4g-m7-fpu_1_3x3" "c86-4g-m7-fpu1*3|c86-4g-m7-fpu3*3")
+(define_reservation "c86-4g-m7-fpu_1_3x6" "c86-4g-m7-fpu1*6|c86-4g-m7-fpu3*6")
+(define_reservation "c86-4g-m7-fpux2" "c86-4g-m7-fpu0*2|c86-4g-m7-fpu1*2|c86-4g-m7-fpu2*2|c86-4g-m7-fpu3*2")
+(define_reservation "c86-4g-m7-fpux4" "c86-4g-m7-fpu0*4|c86-4g-m7-fpu1*4|c86-4g-m7-fpu2*4|c86-4g-m7-fpu3*4")
+(define_reservation "c86-4g-m7-fpux8" "c86-4g-m7-fpu0*8|c86-4g-m7-fpu1*8|c86-4g-m7-fpu2*8|c86-4g-m7-fpu3*8")
+(define_reservation "c86-4g-m7-fpux6" "c86-4g-m7-fpu0*6|c86-4g-m7-fpu1*6|c86-4g-m7-fpu2*6|c86-4g-m7-fpu3*6")
+(define_reservation "c86-4g-m7-fpux16" "c86-4g-m7-fpu0*16|c86-4g-m7-fpu1*16|c86-4g-m7-fpu2*16|c86-4g-m7-fpu3*16")
 (define_reservation "c86-4g-m7-fvector" "c86-4g-m7-fpu0+c86-4g-m7-fpu1
-                                     +c86-4g-m7-fpu2+c86-4g-m7-fpu3
-                                     +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2")
+                                        +c86-4g-m7-fpu2+c86-4g-m7-fpu3
+                                        +c86-4g-m7-agu0+c86-4g-m7-agu1+c86-4g-m7-agu2")
+
+;; Two FP dividers.
+(define_cpu_unit "c86-4g-m7-fdiv1" "c86_4g_m7_fdiv")
+(define_cpu_unit "c86-4g-m7-fdiv3" "c86_4g_m7_fdiv")
+
+(define_reservation "c86-4g-m7-fp1fdiv1x4" "(c86-4g-m7-fpu1+c86-4g-m7-fdiv1)*4")
+(define_reservation "c86-4g-m7-fp3fdiv3x4" "(c86-4g-m7-fpu3+c86-4g-m7-fdiv3)*4")
+(define_reservation "c86-4g-m7-fdiv13" "(c86-4g-m7-fdiv1+c86-4g-m7-fdiv3)")
+(define_reservation "c86-4g-m7-fp13div13" "(c86-4g-m7-fpu1+c86-4g-m7-fpu3+c86-4g-m7-fdiv1+c86-4g-m7-fdiv3)")
+(define_reservation "c86-4g-m7-fp13div13x4" "c86-4g-m7-fp13div13*4")
+(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x8" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*8)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*8)")
+(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x9" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*9)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*9)")
+(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x11" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*11)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*11)")
+(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x15" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*15)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*15)")
+(define_reservation "c86-4g-m7-fp1div1_fp3div3_x4x18" "(c86-4g-m7-fp1fdiv1x4,c86-4g-m7-fdiv1*18)|(c86-4g-m7-fp3fdiv3x4,c86-4g-m7-fdiv3*18)")
 
 ;; IMOV/IMOVX
 (define_insn_reservation "c86_4g_m7_imov_xchg" 1
                         "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-ieu1")
 
 ;; IDIV
-(define_insn_reservation "c86_4g_m7_idiv_DI" 41
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "idiv")
-                                  (and (eq_attr "mode" "DI")
-                                       (eq_attr "memory" "none"))))
-                        "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*41")
-
-(define_insn_reservation "c86_4g_m7_idiv_SI" 25
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "idiv")
-                                  (and (eq_attr "mode" "SI")
-                                       (eq_attr "memory" "none"))))
-                        "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*25")
-
-(define_insn_reservation "c86_4g_m7_idiv_HI" 17
+(define_insn_reservation "c86_4g_m7_idiv" 7
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "idiv")
-                                  (and (eq_attr "mode" "HI")
+                                  (and (eq_attr "mode" "!QI")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*17")
+                        "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*7")
 
-(define_insn_reservation "c86_4g_m7_idiv_QI" 15
+(define_insn_reservation "c86_4g_m7_idiv_QI" 6
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "idiv")
                                   (and (eq_attr "mode" "QI")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-m7-direct,c86-4g-m7-ieu3,c86-4g-m7-idiv*15")
-
-(define_insn_reservation "c86_4g_m7_idiv_DI_load" 45
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "idiv")
-                                  (and (eq_attr "mode" "DI")
-                                       (eq_attr "memory" "load"))))
-                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*41")
-
-(define_insn_reservation "c86_4g_m7_idiv_SI_load" 29
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "idiv")
-                                  (and (eq_attr "mode" "SI")
-                                       (eq_attr "memory" "load"))))
-                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*25")
+                        "c86-4g-m7-double,c86-4g-m7-ieu3,c86-4g-m7-idiv*6")
 
-(define_insn_reservation "c86_4g_m7_idiv_HI_load" 21
+(define_insn_reservation "c86_4g_m7_idiv_load" 11
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "idiv")
-                                  (and (eq_attr "mode" "HI")
+                                  (and (eq_attr "mode" "!QI")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*17")
+                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*7")
 
-(define_insn_reservation "c86_4g_m7_idiv_QI_load" 19
+(define_insn_reservation "c86_4g_m7_idiv_QI_load" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "idiv")
                                   (and (eq_attr "mode" "QI")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*15")
+                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-ieu3,c86-4g-m7-idiv*6")
 
 ;; Integer/genaral Instructions
 (define_insn_reservation "c86_4g_m7_insn" 1
                              (and (eq_attr "type" "sseins")
                                   (and (eq_attr "memory" "none")
                                        (eq_attr "length_immediate" "2"))))
-                        "c86-4g-m7-double,c86-4g-m7-fpu0|c86-4g-m7-fpu3,c86-4g-m7-fpu1")
+                        "c86-4g-m7-double,c86-4g-m7-fpu_0_3,c86-4g-m7-fpu1")
 
 (define_insn_reservation "c86_4g_m7_sse_insert" 3
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "sseins")
                                   (and (eq_attr "memory" "none")
                                        (eq_attr "length_immediate" "!2"))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu1")
+                        "c86-4g-m7-direct,c86-4g-m7-fpu1*2")
 
 ;; FCMOV
 (define_insn_reservation "c86_4g_m7_fp_cmov" 4
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "fpspc")
                                   (eq_attr "c86_attr" "sqrt")))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu1,c86-4g-m7-fdiv*22")
+                        "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x18")
 
 ;; FPSPC
 (define_insn_reservation "c86_4g_m7_fp_spc_direct" 5
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "fdiv")
                                   (eq_attr "memory" "none")))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15")
+                        "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x11")
 
 (define_insn_reservation "c86_4g_m7_fp_div_load" 22
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "fdiv")
                                   (and (eq_attr "fp_int_src" "false")
                                        (eq_attr "memory" "!none"))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x11")
 
 (define_insn_reservation "c86_4g_m7_fp_idiv_load" 26
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "fdiv")
                                   (and (eq_attr "fp_int_src" "true")
                                        (eq_attr "memory" "!none"))))
-                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fdiv*15")
+                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1*4,c86-4g-m7-fp1div1_fp3div3_x4x11")
 
 (define_insn_reservation "c86_4g_m7_fp_fsgn" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "insr")
                                    (and (eq_attr "prefix" "orig")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-double,c86-4g-m7-ieu2,c86-4g-m7-fpu_0_1")
+                        "c86-4g-m7-double,c86-4g-m7-ieu2,c86-4g-m7-fpu")
 
 (define_insn_reservation "c86_4g_m7_sse_pinsr_reg_load" 3
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "insr")
                                    (and (eq_attr "prefix" "orig")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu")
 
 (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg" 2
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "insr")
                                     (and (eq_attr "prefix" "!orig")
                                          (eq_attr "memory" "none")))))
-                        "c86-4g-m7-double,c86-4g-m7-fpu2*2")
+                        "c86-4g-m7-double,c86-4g-m7-fpu_1_3x2")
 
 (define_insn_reservation "c86_4g_m7_avx_vpinsr_reg_load" 8
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "insr")
                                     (and (eq_attr "prefix" "!orig")
                                          (eq_attr "memory" "load")))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1|c86-4g-m7-fpu2|c86-4g-m7-fpu3")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_1_3")
 
 ;; PERM
 (define_insn_reservation "c86_4g_m7_avx512_perm_xmm" 3
                                                  (eq_attr "mode" "V4SF,V2DF,TI"))
                                             (and (eq_attr "c86_attr" "perm")
                                                  (eq_attr "mode" "V8SF,V4DF,TI,OI")))
-                                   (and (eq_attr "prefix" "evex")
-                                        (eq_attr "memory" "none")))))
+                                       (eq_attr "memory" "none"))))
                         "c86-4g-m7-direct,c86-4g-m7-fpu_0_2x2")
 
 (define_insn_reservation "c86_4g_m7_avx512_perm_xmm_opload" 10
                                                  (eq_attr "mode" "V4SF,V2DF,TI"))
                                             (and (eq_attr "c86_attr" "perm")
                                                  (eq_attr "mode" "V8SF,V4DF,TI,OI")))
-                                   (and (eq_attr "prefix" "evex")
-                                        (eq_attr "memory" "load")))))
+                                       (eq_attr "memory" "load"))))
                         "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2x2")
 
 (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm" 4
                                   (and (eq_attr "c86_attr" "perm2")
                                    (and (eq_attr "mode" "V8SF,V4DF,OI")
                                          (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpux4")
 
 (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm" 16
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "perm2")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpux16")
 
 (define_insn_reservation "c86_4g_m7_avx512_permi2_ymm_load" 11
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "perm2")
                                    (and (eq_attr "mode" "V8SF,V4DF,OI")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux4")
 
 (define_insn_reservation "c86_4g_m7_avx512_permi2_zmm_load" 23
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "perm2")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux16")
 
 (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_imm" 4
                         (and (eq_attr "cpu" "c86_4g_m7")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                     (and (match_operand 2 "immediate_operand")
                                          (eq_attr "memory" "none"))))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu_0_2x4")
+                        "c86-4g-m7-direct,c86-4g-m7-fpux4")
 
 (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_imm_load" 11
                         (and (eq_attr "cpu" "c86_4g_m7")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                     (and (match_operand 2 "immediate_operand")
                                          (eq_attr "memory" "load"))))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2x4")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpux4")
 
 (define_insn_reservation "c86_4g_m7_avx512_perm_zmm_noimm" 8
                         (and (eq_attr "cpu" "c86_4g_m7")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                     (and (match_operand 2 "nonimmediate_operand")
                                          (eq_attr "memory" "none"))))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpux8")
 
 (define_insn_reservation "c86_4g_m7_sse_perm_zmm_noimm_load" 15
                         (and (eq_attr "cpu" "c86_4g_m7")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                     (and (match_operand 2 "nonimmediate_operand")
                                         (eq_attr "memory" "load"))))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
-
-(define_insn_reservation "c86_4g_m7_avx_perm_ymm" 3
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "sselog")
-                                  (and (eq_attr "c86_attr" "perm")
-                                    (and (eq_attr "prefix" "!evex")
-                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
-
-(define_insn_reservation "c86_4g_m7_avx_perm_ymem" 10
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "sselog")
-                                  (and (eq_attr "c86_attr" "perm")
-                                    (and (eq_attr "prefix" "!evex")
-                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux8")
 
 ;; VINSERT
 (define_insn_reservation "c86_4g_m7_avx512_insertx_ymm" 3
                                   (and (eq_attr "c86_attr" "shufx")
                                     (and (eq_attr "mode" "V8DF,V16SF,XI")
                                          (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_0_2x4")
 
 (define_insn_reservation "c86_4g_m7_avx512_shuf_xymem" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "shufx")
                                     (and (eq_attr "mode" "V8DF,V16SF,XI")
                                          (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2x4")
 
 ;; SSELOGIC
 (define_insn_reservation "c86_4g_m7_sselogic_xymm" 1
                              (and (eq_attr "type" "sselog")
                                   (and (eq_attr "c86_attr" "cmpestr")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpux6")
 
 (define_insn_reservation "c86_4g_m7_avx512_cmpestr_load" 13
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "sselog")
                                   (and (eq_attr "c86_attr" "cmpestr")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux6")
 
 ;; SSELOG
 (define_insn_reservation "c86_4g_m7_avx512_log" 1
                                   (and (eq_attr "c86_attr" "sadbw")
                                    (and (eq_attr "mode" "XI")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2")
 
 (define_insn_reservation "c86_4g_m7_avx512_vdbpsadbw_zmem" 11
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "sadbw")
                                    (and (eq_attr "mode" "XI")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2")
 
 ;; ABS
 (define_insn_reservation "c86_4g_m7_avx512_abs" 1
                              (and (eq_attr "type" "ssecomi")
                                   (and (eq_attr "prefix_extra" "0")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-m7-double,c86-4g-m7-fpu2|c86-4g-m7-fpu3")
+                        "c86-4g-m7-double,c86-4g-m7-fpu")
 
 (define_insn_reservation "c86_4g_m7_avx_ssecomi_comi_load" 8
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssecomi")
                                   (and (eq_attr "prefix_extra" "0")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu2|c86-4g-m7-fpu3")
+                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu")
 
 (define_insn_reservation "c86_4g_m7_avx_ssecomi_test" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "expand,compress")
                                    (and (not (eq_attr "mode" "XI,V16SF,V8DF"))
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2")
+                        "c86-4g-m7-direct,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_expand_load" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "expand,compress")
                                    (and (not (eq_attr "mode" "XI,V16SF,V8DF"))
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_expand_z" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "expand,compress")
                                    (and (eq_attr "mode" "XI,V16SF,V8DF")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_expand_z_load" 17
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "expand,compress")
                                    (and (eq_attr "mode" "XI,V16SF,V8DF")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fpu_0_3")
 
 ;; MOVNT
 (define_insn_reservation "c86_4g_m7_avx512_movnt_load" 8
                                         (eq_attr "memory" "!none")))))
                         "c86-4g-m7-direct,c86-4g-m7-store,c86-4g-m7-fpu1")
 
-(define_insn_reservation "c86_4g_m7_sse_movnt_xy" 4
+(define_insn_reservation "c86_4g_m7_sse_movnt" 4
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssemov")
                                   (and (eq_attr "c86_attr" "movnt")
                              (and (eq_attr "type" "sseadd")
                                   (and (eq_attr "c86_attr" "other")
                                          (eq_attr "memory" "none"))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu3")
+                        "c86-4g-m7-direct,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_sseadd_xy_load" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "sseadd")
                                   (and (eq_attr "c86_attr" "other")
                                         (eq_attr "memory" "load"))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_1_3")
 
 ;; HADD/HSUB
 (define_insn_reservation "c86_4g_m7_avx_sseadd_hplus" 7
                                   (and (eq_attr "c86_attr" "hplus")
                                    (and (eq_attr "prefix" "orig")
                                     (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector,c86-4g-m7-fpu0*2")
+                        "c86-4g-m7-vector,c86-4g-m7-fpux2")
 
 (define_insn_reservation "c86_4g_m7_sse_sseiadd_hplus_load" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_attr" "hplus")
                                    (and (eq_attr "prefix" "orig")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu0*2")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpux2")
 
 ;; SSEMUL
 (define_insn_reservation "c86_4g_m7_avx512_ssemul" 3
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssemul")
                                   (eq_attr "memory" "none")))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu0")
+                        "c86-4g-m7-direct,c86-4g-m7-fpu_0_2")
 
 (define_insn_reservation "c86_4g_m7_avx512_ssemul_load" 10
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssemul")
                                   (eq_attr "memory" "load")))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu0")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2")
 
 ;; SSEDIV
-(define_insn_reservation "c86_4g_m7_avx512_ssediv" 13
+(define_insn_reservation "c86_4g_m7_avx512_ssediv_x" 13
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "ssediv")
+                                  (and (eq_attr "mode" "SF,DF,V4SF,V2DF")
+                                       (eq_attr "memory" "none"))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x8")
+
+(define_insn_reservation "c86_4g_m7_avx512_ssediv_xmem" 20
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "ssediv")
+                                  (and (eq_attr "mode" "SF,DF,V4SF,V2DF")
+                                       (eq_attr "memory" "load"))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x8")
+
+(define_insn_reservation "c86_4g_m7_avx512_ssediv_y" 13
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssediv")
-                                  (and (not (eq_attr "mode" "V16SF,V8DF"))
+                                  (and (eq_attr "mode" "V8SF,V4DF")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu3,c86-4g-m7-fdiv*13")
+                        "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*8")
 
-(define_insn_reservation "c86_4g_m7_avx512_ssediv_mem" 20
+(define_insn_reservation "c86_4g_m7_avx512_ssediv_ymem" 20
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssediv")
-                                  (and (not (eq_attr "mode" "V16SF,V8DF"))
+                                  (and (eq_attr "mode" "V8SF,V4DF")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fdiv*13")
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*8")
 
 (define_insn_reservation "c86_4g_m7_avx512_ssediv_z" 24
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssediv")
                                   (and (eq_attr "mode" "V16SF,V8DF")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-m7-double,c86-4g-m7-fpu3,c86-4g-m7-fdiv*24")
+                        "c86-4g-m7-double,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*20")
 
 (define_insn_reservation "c86_4g_m7_avx512_ssediv_zmem" 31
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssediv")
                                   (and (eq_attr "mode" "V16SF,V8DF")
                                         (eq_attr "memory" "load"))))
-                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu3,c86-4g-m7-fdiv*24")
+                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*20")
 
 ;; SSECMP
 (define_insn_reservation "c86_4g_m7_avx512_ssecmp" 5
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                     (and (eq_attr "c86_attr" "other")
                                          (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_ssecmp_z_load" 12
                         (and (eq_attr "cpu" "c86_4g_m7")
                                    (and (eq_attr "mode" "V16SF,V8DF,XI")
                                     (and (eq_attr "c86_attr" "other")
                                          (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_0_2,c86-4g-m7-fpu_1_3x2")
 
 (define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp" 5
                         (and (eq_attr "cpu" "c86_4g_m7")
                                          (eq_attr "memory" "load"))))))
                         "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3")
 
+(define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp_z" 5
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "ssecmp")
+                                  (and (eq_attr "prefix" "evex")
+                                   (and (eq_attr "mode" "XI")
+                                    (and (eq_attr "c86_attr" "other,ptest")
+                                         (eq_attr "memory" "none"))))))
+                        "c86-4g-m7-double,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3")
+
+(define_insn_reservation "c86_4g_m7_avx512_ssecmp_vp_z_load" 12
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "ssecmp")
+                                  (and (eq_attr "prefix" "evex")
+                                   (and (eq_attr "mode" "XI")
+                                    (and (eq_attr "c86_attr" "other,ptest")
+                                         (eq_attr "memory" "load"))))))
+                        "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu,c86-4g-m7-fpu_1_3x2")
+
 (define_insn_reservation "c86_4g_m7_avx_ssecmp_vp" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssecmp")
                                          (eq_attr "memory" "load")))))
                         "c86-4g-m7-double,c86-4g-m7-load,c86-4g-m7-fpu1,c86-4g-m7-fpu_1_3")
 
-(define_insn_reservation "c86_4g_m7_avx512_ssecmp_test_z" 4
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "ssecmp")
-                                   (and (eq_attr "mode" "XI")
-                                    (and (eq_attr "c86_attr" "ptest")
-                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
-
-(define_insn_reservation "c86_4g_m7_avx512_ssecmp_test_z_load" 11
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "ssecmp")
-                                   (and (eq_attr "mode" "XI")
-                                    (and (eq_attr "c86_attr" "ptest")
-                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
-
 ;; SSECVT
 (define_insn_reservation "c86_4g_m7_avx512_ssecvt_xy" 4
                         (and (eq_attr "cpu" "c86_4g_m7")
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssemuladd")
                                   (and (eq_attr "c86_attr" "other")
-                                   (and (not (eq_attr "isa" "fma,fma4"))
-                                        (eq_attr "mode" "V32HF,V16SF,V8DF,XI")
-                                         (eq_attr "memory" "none")))))
+                                       (eq_attr "memory" "none"))))
                         "c86-4g-m7-direct,c86-4g-m7-fpu_0_2")
 
 (define_insn_reservation "c86_4g_m7_avx512_muladd_load" 11
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "ssemuladd")
                                   (and (eq_attr "c86_attr" "other")
-                                   (and (not (eq_attr "isa" "fma,fma4"))
-                                        (eq_attr "memory" "load")))))
+                                       (eq_attr "memory" "load"))))
                         "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2")
 
 (define_insn_reservation "c86_4g_m7_avx512_muladd_madd" 4
                                         (eq_attr "memory" "load")))))
                         "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_2")
 
-(define_insn_reservation "c86_4g_m7_fma_muladd" 4
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "ssemuladd")
-                                  (and (eq_attr "isa" "fma,fma4")
-                                       (eq_attr "memory" "none"))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu_0_1")
-
-(define_insn_reservation "c86_4g_m7_fma_muladd_load" 11
-                        (and (eq_attr "cpu" "c86_4g_m7")
-                             (and (eq_attr "type" "ssemuladd")
-                                  (and (eq_attr "isa" "fma,fma4")
-                                       (eq_attr "memory" "load"))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1")
-
 ;; SSE
 (define_insn_reservation "c86_4g_m7_avx512_sse_range" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_decode" "vector")
                                    (and (eq_attr "mode" "TI")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x2")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_x_load" 9
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_decode" "vector")
                                    (and (eq_attr "mode" "TI")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x2")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_y" 5
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_decode" "vector")
                                    (and (eq_attr "mode" "OI")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x3")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_y_load" 12
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_decode" "vector")
                                    (and (eq_attr "mode" "OI")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x3")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_z" 8
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_decode" "vector")
                                    (and (eq_attr "mode" "XI")
                                         (eq_attr "memory" "none")))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_1_3x6")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_conflict_z_load" 15
                         (and (eq_attr "cpu" "c86_4g_m7")
                                   (and (eq_attr "c86_decode" "vector")
                                    (and (eq_attr "mode" "XI")
                                         (eq_attr "memory" "load")))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3x6")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_class" 4
                         (and (eq_attr "cpu" "c86_4g_m7")
                                     (and (eq_attr "length_immediate" "1")
                                      (and (eq_attr "mode" "V32HF,V16SF,V8DF")
                                           (eq_attr "memory" "none"))))))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_1_3,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_sse_class_z_load" 11
                         (and (eq_attr "cpu" "c86_4g_m7")
                                     (and (eq_attr "length_immediate" "1")
                                      (and (eq_attr "mode" "V32HF,V16SF,V8DF")
                                           (eq_attr "memory" "load"))))))
-                        "c86-4g-m7-vector,c86-4g-m7-load")
+                        "c86-4g-m7-vector,c86-4g-m7-load,c86-4g-m7-fpu_1_3,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx_sse" 5
                         (and (eq_attr "cpu" "c86_4g_m7")
                                         (eq_attr "memory" "load")))))
                         "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu_0_1")
 
-(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt" 16
+;; SSE SQRT
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_x" 14
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "sse")
-                                  (and (eq_attr "c86_attr" "sqrt")
-                                       (eq_attr "memory" "none"))))
-                        "c86-4g-m7-direct,c86-4g-m7-fpu1|c86-4g-m7-fpu3,c86-4g-m7-fdiv*16")
+                                  (and (eq_attr "mode" "SF,V4SF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "none")))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x9")
 
-(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_load" 23
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_xload" 21
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "sse")
-                                  (and (eq_attr "c86_attr" "sqrt")
-                                       (eq_attr "memory" "load"))))
-                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fpu1|c86-4g-m7-fpu3,c86-4g-m7-fdiv*16")
+                                  (and (eq_attr "mode" "SF,V4SF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "load")))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x9")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_y" 14
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V8SF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "none")))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*9")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_yload" 21
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V8SF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "load")))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*9")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_z" 26
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V16SF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "none")))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*22")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_sf_zload" 33
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V16SF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "load")))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*22")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_x" 20
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "DF,V2DF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "none")))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp1div1_fp3div3_x4x15")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_xload" 27
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "DF,V2DF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "load")))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp1div1_fp3div3_x4x15")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_y" 20
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V4DF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "none")))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*15")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_yload" 27
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V4DF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "load")))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*15")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_z" 38
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V8DF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "none")))))
+                        "c86-4g-m7-direct,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*34")
+
+(define_insn_reservation "c86_4g_m7_avx512_sse_sqrt_df_zload" 45
+                        (and (eq_attr "cpu" "c86_4g_m7")
+                             (and (eq_attr "type" "sse")
+                                  (and (eq_attr "mode" "V8DF")
+                                   (and (eq_attr "c86_attr" "sqrt")
+                                        (eq_attr "memory" "load")))))
+                        "c86-4g-m7-direct,c86-4g-m7-load,c86-4g-m7-fp13div13x4,c86-4g-m7-fdiv13*34")
 
 ;; MSKLOG/MSKMOV
 (define_insn_reservation "c86_4g_m7_avx512_msklog" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "msklog")
                                   (eq_attr "c86_decode" "vector")))
-                        "c86-4g-m7-vector")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_mskmov_reg_k" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "mskmov")
                                   (match_operand:V8DI 0 "register_operand" "v")))
-                        "c86-4g-m7-vector,c86-4g-m7-fpu3*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2")
+                        "c86-4g-m7-vector,c86-4g-m7-fpu3,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_mskmov_k_k" 1
                         (and (eq_attr "cpu" "c86_4g_m7")
                              (and (eq_attr "type" "mskmov")
                                  (and (match_operand 0 "register_operand" "k")
                                       (match_operand 1 "register_operand" "r"))))
-                        "c86-4g-m7-double,c86-4g-m7-fpu1*2,c86-4g-m7-fpu1*2|c86-4g-m7-fpu3*2")
+                        "c86-4g-m7-double,c86-4g-m7-fpu1,c86-4g-m7-fpu_1_3")
 
 (define_insn_reservation "c86_4g_m7_avx512_mskmov_k_m" 8
                         (and (eq_attr "cpu" "c86_4g_m7")
index 49a46a8aa19ed8426a3a174b77023709d0985420..8b81fcaabb28571106ab6972f7d3b1a0e7675d6c 100644 (file)
 ;; HYGON Scheduling
 ;; Modeling automatons for decoders, integer execution pipes,
 ;; AGU pipes, floating point execution units, integer and
-;; floating point dividers.
-(define_automaton "c86_4g, c86_4g_ieu, c86_4g_fp, c86_4g_agu, c86_4g_idiv, c86_4g_fdiv")
+;; floating point dividers.  Split fp1 into its own automaton
+;; to keep this unit independent without increasing the main
+;; c86_4g_fp state space.
+(define_automaton "c86_4g, c86_4g_ieu, c86_4g_fp024, c86_4g_fp1, c86_4g_agu, c86_4g_idiv, c86_4g_fdiv")
 
 ;; Decoders unit has 4 decoders and all of them can decode fast path
 ;; and vector type instructions.
 (define_cpu_unit "c86-4g-decode2" "c86_4g")
 (define_cpu_unit "c86-4g-decode3" "c86_4g")
 
-;; Two separated dividers for int and fp.
-(define_cpu_unit "c86-4g-idiv" "c86_4g_idiv")
-(define_cpu_unit "c86-4g-fdiv" "c86_4g_fdiv")
-
 ;; Currently blocking all decoders for vector path instructions as
 ;; they are dispatched separetely as microcode sequence.
 ;; Fix me: Need to revisit this.
@@ -55,7 +53,6 @@
 ;; Fix me: Need to revisit this later to simulate fast path double behavior.
 (define_reservation "c86-4g-double" "c86-4g-direct")
 
-
 ;; Integer unit 4 ALU pipes.
 (define_cpu_unit "c86-4g-ieu0" "c86_4g_ieu")
 (define_cpu_unit "c86-4g-ieu1" "c86_4g_ieu")
@@ -63,6 +60,9 @@
 (define_cpu_unit "c86-4g-ieu3" "c86_4g_ieu")
 (define_reservation "c86-4g-ieu" "c86-4g-ieu0|c86-4g-ieu1|c86-4g-ieu2|c86-4g-ieu3")
 
+;; One separated integer divider.
+(define_cpu_unit "c86-4g-idiv" "c86_4g_idiv")
+
 ;; 2 AGU pipes in c86_4g
 ;; According to CPU diagram last AGU unit is used only for stores.
 (define_cpu_unit "c86-4g-agu0" "c86_4g_agu")
                                      +c86-4g-agu0+c86-4g-agu1")
 
 ;; Floating point unit 4 FP pipes.
-(define_cpu_unit "c86-4g-fp0" "c86_4g_fp")
-(define_cpu_unit "c86-4g-fp1" "c86_4g_fp")
-(define_cpu_unit "c86-4g-fp2" "c86_4g_fp")
-(define_cpu_unit "c86-4g-fp3" "c86_4g_fp")
+(define_cpu_unit "c86-4g-fp0" "c86_4g_fp024")
+(define_cpu_unit "c86-4g-fp1" "c86_4g_fp1")
+(define_cpu_unit "c86-4g-fp2" "c86_4g_fp024")
+(define_cpu_unit "c86-4g-fp3" "c86_4g_fp024")
 
 (define_reservation "c86-4g-fpu" "c86-4g-fp0|c86-4g-fp1|c86-4g-fp2|c86-4g-fp3")
 
                                      +c86-4g-fp2+c86-4g-fp3
                                      +c86-4g-agu0+c86-4g-agu1")
 
+;; One separated FP divider.
+(define_cpu_unit "c86-4g-fdiv" "c86_4g_fdiv")
+
+(define_reservation "c86-4g-fp1fdivx4" "(c86-4g-fp1+c86-4g-fdiv)*4")
+
 ;; Call instruction
 (define_insn_reservation "c86_4g_call" 1
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "type" "fpspc")
                                   (eq_attr "c86_attr" "sqrt")))
-                        "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*22")
+                        "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*18")
 
 (define_insn_reservation "c86_4g_sse_sqrt_sf" 14
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                   (and (eq_attr "memory" "none,unknown")
                                        (and (eq_attr "c86_attr" "sqrt")
                                             (eq_attr "type" "sse")))))
-                        "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*14")
+                        "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*10")
 
 (define_insn_reservation "c86_4g_sse_sqrt_sf_mem" 21
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                   (and (eq_attr "memory" "load")
                                        (and (eq_attr "c86_attr" "sqrt")
                                             (eq_attr "type" "sse")))))
-                        "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*14")
+                        "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*10")
 
 (define_insn_reservation "c86_4g_sse_sqrt_df" 20
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                   (and (eq_attr "memory" "none,unknown")
                                        (and (eq_attr "c86_attr" "sqrt")
                                             (eq_attr "type" "sse")))))
-                        "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*20")
+                        "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*16")
 
 (define_insn_reservation "c86_4g_sse_sqrt_df_mem" 27
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                   (and (eq_attr "memory" "load")
                                        (and (eq_attr "c86_attr" "sqrt")
                                             (eq_attr "type" "sse")))))
-                        "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*20")
+                        "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*16")
 
 ;; RCP
 (define_insn_reservation "c86_4g_sse_rcp" 5
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "type" "fdiv")
                                   (eq_attr "memory" "none")))
-                        "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*15")
+                        "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*11")
 
 (define_insn_reservation "c86_4g_fp_op_div_load" 22
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "type" "fdiv")
                                   (eq_attr "memory" "load")))
-                        "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*15")
+                        "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*11")
 
-(define_insn_reservation "c86_4g_fp_op_idiv_load" 27
+(define_insn_reservation "c86_4g_fp_op_idiv_load" 26
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "type" "fdiv")
                                   (and (eq_attr "fp_int_src" "true")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*19")
+                        "c86-4g-double,c86-4g-load,c86-4g-fp1*4,c86-4g-fp1fdivx4,c86-4g-fdiv*11")
 
 ;; MMX, SSE, SSEn.n, AVX, AVX2 instructions
 (define_insn_reservation "c86_4g_fp_insn" 1
                                        (eq_attr "mode" "V4SF,SF"))
                              (and (eq_attr "type" "ssediv")
                                   (eq_attr "memory" "none")))
-                        "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*10")
+                        "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*6")
 
 (define_insn_reservation "c86_4g_ssediv_ss_ps_load" 17
                         (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                        (eq_attr "mode" "V4SF,SF"))
                              (and (eq_attr "type" "ssediv")
                                   (eq_attr "memory" "load")))
-                        "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*10")
+                        "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*6")
 
 (define_insn_reservation "c86_4g_ssediv_sd_pd" 13
                         (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                        (eq_attr "mode" "V2DF,DF"))
                              (and (eq_attr "type" "ssediv")
                                   (eq_attr "memory" "none")))
-                        "c86-4g-direct,c86-4g-fp1,c86-4g-fdiv*13")
+                        "c86-4g-direct,c86-4g-fp1fdivx4,c86-4g-fdiv*9")
 
 (define_insn_reservation "c86_4g_ssediv_sd_pd_load" 20
                         (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                                               (eq_attr "mode" "V2DF,DF"))
                              (and (eq_attr "type" "ssediv")
                                   (eq_attr "memory" "load")))
-                        "c86-4g-direct,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*13")
+                        "c86-4g-direct,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*9")
 
 
 (define_insn_reservation "c86_4g_ssediv_avx256_ps" 10
                              (and (eq_attr "mode" "V8SF")
                                   (and (eq_attr "memory" "none")
                                        (eq_attr "type" "ssediv"))))
-                        "c86-4g-double,c86-4g-fp1,c86-4g-fdiv*10")
+                        "c86-4g-double,c86-4g-fp1fdivx4,c86-4g-fdiv*6")
 
 (define_insn_reservation "c86_4g_ssediv_avx256_ps_load" 17
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "mode" "V8SF")
                                   (and (eq_attr "type" "ssediv")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*10")
+                        "c86-4g-double,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*6")
 
 (define_insn_reservation "c86_4g_ssediv_avx256_pd" 13
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "mode" "V4DF")
                                   (and (eq_attr "type" "ssediv")
                                        (eq_attr "memory" "none"))))
-                        "c86-4g-double,c86-4g-fp1,c86-4g-fdiv*13")
+                        "c86-4g-double,c86-4g-fp1fdivx4,c86-4g-fdiv*9")
 
 (define_insn_reservation "c86_4g_ssediv_avx256_pd_load" 20
                         (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")
                              (and (eq_attr "mode" "V4DF")
                                   (and (eq_attr "type" "ssediv")
                                        (eq_attr "memory" "load"))))
-                        "c86-4g-double,c86-4g-load,c86-4g-fp1,c86-4g-fdiv*13")
+                        "c86-4g-double,c86-4g-load,c86-4g-fp1fdivx4,c86-4g-fdiv*9")
 ;; SSE MUL
 (define_insn_reservation "c86_4g_ssemul_ss_ps" 3
                         (and (and (eq_attr "cpu" "c86_4g_m4,c86_4g_m6")