Re: [PATCH] Redesign pthread in LIB_SPEC for systems without libpthread
Hello, > All good, thanks! Checked into MT: http://gcc.gnu.org/viewcvs?rev=201871&root=gcc&view=rev -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] Add AVX-512 patterns.
On 19 Aug 15:01, Richard Henderson wrote: > > ;; All vector modes including V?TImode, used in move patterns. > > (define_mode_iterator V16 > > - [(V32QI "TARGET_AVX") V16QI > > - (V16HI "TARGET_AVX") V8HI > > - (V8SI "TARGET_AVX") V4SI > > - (V4DI "TARGET_AVX") V2DI > > + [(V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI > > + (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI > > + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI > > + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI > > (V2TI "TARGET_AVX") V1TI > > - (V8SF "TARGET_AVX") V4SF > > - (V4DF "TARGET_AVX") V2DF]) > > + (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF > > + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) > > Let's rename this VMOVE, and apply only that change as a separate patch. Hello, I think this is kinda obvious. I've renamed V16 iterator into VMOVE. ChangeLog entry: 2013-08-20 Kirill Yukhin * config/i386/sse.md (V16): Rename to... (VMOVE): this. (mov): Update iterator name. (*mov_internal): Ditto. (push1): Ditto. (movmisalign): Ditto. Bootstrap passing. I'll check it in into MT as obvious if nobody objects in 20 hrs. Thanks, K --- gcc/config/i386/sse.md | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4397498..5c07dd7 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -96,7 +96,7 @@ ]) ;; All vector modes including V?TImode, used in move patterns. -(define_mode_iterator V16 +(define_mode_iterator VMOVE [(V32QI "TARGET_AVX") V16QI (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI @@ -435,8 +435,8 @@ ;; This is essential for maintaining stable calling conventions. (define_expand "mov" - [(set (match_operand:V16 0 "nonimmediate_operand") - (match_operand:V16 1 "nonimmediate_operand"))] + [(set (match_operand:VMOVE 0 "nonimmediate_operand") + (match_operand:VMOVE 1 "nonimmediate_operand"))] "TARGET_SSE" { ix86_expand_vector_move (mode, operands); @@ -444,8 +444,8 @@ }) (define_insn "*mov_internal" - [(set (match_operand:V16 0 "nonimmediate_operand" "=x,x ,m") - (match_operand:V16 1 "nonimmediate_or_sse_const_operand" "C ,xm,x"))] + [(set (match_operand:VMOVE 0 "nonimmediate_operand" "=x,x ,m") + (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand" "C ,xm,x"))] "TARGET_SSE && (register_operand (operands[0], mode) || register_operand (operands[1], mode))" @@ -586,7 +586,7 @@ }) (define_expand "push1" - [(match_operand:V16 0 "register_operand")] + [(match_operand:VMOVE 0 "register_operand")] "TARGET_SSE" { ix86_expand_push (mode, operands[0]); @@ -594,8 +594,8 @@ }) (define_expand "movmisalign" - [(set (match_operand:V16 0 "nonimmediate_operand") - (match_operand:V16 1 "nonimmediate_operand"))] + [(set (match_operand:VMOVE 0 "nonimmediate_operand") + (match_operand:VMOVE 1 "nonimmediate_operand"))] "TARGET_SSE" { ix86_expand_vector_move_misalign (mode, operands); -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] Add AVX-512 patterns.
On 20 Aug 08:30, Richard Henderson wrote: Hello, > This is ok. Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-08/msg00504.html -- Thanks, K
Re: [PATCH i386 1/8] [AVX512] Adjust register classes.
Hello, > The patch is ok to commit. Thanks a lot! Checked in to main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-08/msg00524.html -- K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello Richard, On 19 Aug 14:17, Richard Henderson wrote: > On 08/14/2013 12:23 AM, Kirill Yukhin wrote: > > + ;; For AVX512F mask support > > + UNSPEC_KIOR > > + UNSPEC_KXOR > > + UNSPEC_KAND > > + UNSPEC_KANDN > > I thought we determined that you didn't need these, > that "*Yk" as a constraint was sufficient. As far as I understood, we're talking about incorporating of mask logic instructions into existing patterns + making mask constraints disparage. E.g. for OR we have: (define_insn "*_1" [(set (match_operand:SWI248 0 "nonimmediate_operand" "=r,rm") (any_or:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "%0,0") (match_operand:SWI248 2 "" ",r"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (, mode, operands)" "{}\t{%2, %0|%0, %2}" Despite of generic OR, mask version of OR do not clobber FLAGS_REG. Of course, we may conservatively think that it is, but I believe this is not good idea. Making single constraint in new pattern disparage have no sense as far as I understad, since this is relative notion. So, what should I do? -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] Add substed patterns.
On 14 Aug 11:44, Kirill Yukhin wrote: PING? Thanks, K
Re: [PATCH i386 1/8] [AVX512] Adjust register classes.
Hello, On 21 Aug 13:02, Richard Henderson wrote: > On 08/21/2013 11:28 AM, Kirill Yukhin wrote: > > (eq_attr "alternative" "12,13") > > - (cond [(ior (not (match_test "TARGET_SSE2")) > > + (cond [(ior (match_test "EXT_REX_SSE_REGNO_P (REGNO > > (operands[0]))") > > + (and (match_test "REG_P (operands[1])") > > + (match_test "EXT_REX_SSE_REGNO_P (REGNO > > (operands[1]))"))) > > + (const_string "XI") > > +(ior (not (match_test "TARGET_SSE2")) > > (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) > >(const_string "V4SF") > > (match_test "TARGET_AVX") > > Better. And while it produces the correct results, using match_operand would > be better than embedding a reference to operands within a match_test. In order to get rid of direct references to operands in attrs of scalar mov*_internal I've introduced new predicate and use it with match_operand instead. ChangeLog: 2013-08-22 Kirill Yukhin * gcc/config/i386/i386.md (*movti_internal): Use predicate to determine if EVEX is needed. (*movsi_internal): Ditto. (*movdf_internal): Ditto. (*movsf_internal): Ditto. * gcc/config/i386/mmx.md (*mov_internal): Ditto. Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. Is it ok to install to trunk? -- Thanks, K --- gcc/config/i386/i386.md | 20 gcc/config/i386/mmx.md| 5 ++--- gcc/config/i386/predicates.md | 6 ++ 3 files changed, 16 insertions(+), 15 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index b55fd6f..3d7533a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -2059,9 +2059,8 @@ (cond [(eq_attr "alternative" "2") (const_string "SI") (eq_attr "alternative" "12,13") - (cond [(ior (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[0]))") - (and (match_test "REG_P (operands[1])") - (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[1]))"))) + (cond [(ior (match_operand 0 "ext_sse_reg_operand") + (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") (ior (not (match_test "TARGET_SSE2")) (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) @@ -2192,9 +2191,8 @@ (cond [(eq_attr "alternative" "2,3") (const_string "DI") (eq_attr "alternative" "6,7") - (cond [(ior (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[0]))") - (and (match_test "REG_P (operands[1])") - (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[1]))"))) + (cond [(ior (match_operand 0 "ext_sse_reg_operand") + (match_operand 1 "ext_sse_reg_operand")) (const_string "XI") (ior (not (match_test "TARGET_SSE2")) (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) @@ -2923,9 +2921,8 @@ /* movaps is one byte shorter for non-AVX targets. */ (eq_attr "alternative" "10,14") -(cond [(ior (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[0]))") -(and (match_test "REG_P (operands[1])") - (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[1]))"))) +(cond [(ior (match_operand 0 "ext_sse_reg_operand") +(match_operand 1 "ext_sse_reg_operand")) (const_string "V8DF") (ior (not (match_test "TARGET_SSE2")) (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL")) @@ -3072,9 +3069,8 @@ better to maintain the whole registers in single format to avoid problems on using packed logical operations. */ (eq_attr "alternative" "6") -(cond [(ior (match_test "EXT_REX_SSE_REGNO_P (REGNO (operands[0]))"
Re: [PATCH i386 1/8] [AVX512] Adjust register classes.
Hello, On 22 Aug 12:06, Richard Henderson wrote: > Ok. I've updated ChangeLog (thanks, HJ!) and checked in to main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-08/msg00545.html -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
On 22 Aug 08:49, Richard Henderson wrote: Hello, > You can always split away the clobber after reload, as we do for > when add gets implemented with lea. I've refactored the patch, making mask logic insn patterns non-unspec. Unfortunately I was unable to use '*' in mask alternative, since it is not working. So, I've put '!' and all gooes fine. '?' is not working as well. Vlad, could you pls clarify, if it is ok to use '!'? ChangeLog: 2013-07-24 Alexander Ivchenko Maxim Kuznetsov Sergey Lega Anna Tikhonova Ilya Tocar Andrey Turetskiy Ilya Verbin Kirill Yukhin Michael Zolotukhin * config/i386/constraints.md (k): New. (Yk): Ditto. * config/i386/i386.c (const regclass_map): Add new mask registers. (dbx_register_map): Ditto. (dbx64_register_map): Ditto. (svr4_dbx_register_map): Ditto. (ix86_conditional_register_usage): Squash mask registers if AVX512F is disabled. (ix86_preferred_reload_class): Disable constants for mask registers. (ix86_secondary_reload): Do spill of mask register using 32-bit insn. (ix86_hard_regno_mode_ok): Support new mask registers. (x86_order_regs_for_local_alloc): Ditto. * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Update. (FIXED_REGISTERS): Add new mask registers. (CALL_USED_REGISTERS): Ditto. (REG_ALLOC_ORDER): Ditto. (VALID_MASK_REG_MODE): New. (FIRST_MASK_REG): Ditto. (LAST_MASK_REG): Ditto. (reg_class): Add MASK_EVEX_REGS, MASK_REGS. (MAYBE_MASK_CLASS_P): New. (REG_CLASS_NAMES): Add MASK_EVEX_REGS, MASK_REGS. (REG_CLASS_CONTENTS): Ditto. (MASK_REGNO_P): New. (ANY_MASK_REG_P): Ditto. (HI_REGISTER_NAMES): Add new mask registers. * config/i386/i386.md (MASK0_REG, MASK1_REG, MASK2_REG, MASK3_REG, MASK4_REG, MASK5_REG, MASK6_REG, MASK7_REG): Constants for new mask registers. (attribute "type"): Add mskmov, msklog. (attribute "length_immediate"): Support them. (attribute "memory"): Ditto. (attribute "prefix_0f"): Ditto. (*movhi_internal): Support new mask registers. (*movqi_internal): Ditto. (define_split): Split out clobber pattern is a logic insn on mask registers. (*k): New. (andhi_1): Make code visible, extend to support mask regs. (*andqi_1): Extend to support mask regs. (kandn): New. (define_split): Split and-not to and and not if operands are not mask regs. (*_1): Separate HI mode to new pattern... (hi_1): This. (*qi_1): Extend to support mask regs. (kxnor): New. (define_split): New split for not-xor. (kortestzhi): new. (kortestchi): Ditto. (kunpckhi): Ditto. (*one_cmpl2_1): Remove HImode and handle it... (*one_cmplhi2_1): ...Here, now with mask registers support. (*one_cmplqi2_1): Support new mask registers. (HI/QImode arithmetics splitter): Don't split if mask registers are used. (HI/QImode not splitter): Ditto. Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. Is it ok? Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 34 +++-- gcc/config/i386/i386.h | 40 -- gcc/config/i386/i386.md| 279 ++--- gcc/config/i386/predicates.md | 4 + 5 files changed, 306 insertions(+), 59 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" +"@internal Any mask register.") + ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" "
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello Reichard, On 26 Aug 09:37, Richard Henderson wrote: > On 08/26/2013 09:13 AM, Kirill Yukhin wrote: > > +(define_split > > + [(set (match_operand:SWI12 0 "mask_reg_operand") > > + (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand") > > +(match_operand:SWI12 2 "mask_reg_operand"))) > > + (clobber (reg:CC FLAGS_REG))] > > + "TARGET_AVX512F && reload_completed" > > + [(set (match_operand:SWI12 0 "mask_reg_operand") > > + (any_logic:SWI12 (match_operand:SWI12 1 "mask_reg_operand") > > +(match_operand:SWI12 2 "mask_reg_operand")))]) > > You must you match_dup on the new pattern half of define_split. > This pattern must never have triggered during your tests, since > it should have resulted in garbage rtl, and an ICE. Thanks, fixed. > > + (match_operand:SWI12 2 "register_operand" "r,Yk")))] > > + "TARGET_AVX512F" > > + "@ > > + # > > + kandnw\t{%2, %1, %0|%0, %1, %2}" > > + [(set_attr "type" "*,msklog") > > + (set_attr "prefix" "*,vex") > > + (set_attr "mode" "")]) > > What happened to the bmi andn alternative we discussed? BMI only supported for 4- and 8- byte integers, while kandw - for HI/QI > > + and{l}\t{%k2, %k0|%k0, %k2} > > + #" > > + [(set_attr "type" "alu,alu,alu,msklog") > > + (set_attr "mode" "QI,QI,SI,HI")]) > > Why force the split? You can write the kand here... Done. > > + {w}\t{%2, %0|%0, %2} > > + #" > > + [(set_attr "type" "alu,alu,msklog") > > + (set_attr "mode" "HI")]) > > Likewise. Done. > The point being that with optimization enabled, we will have run the split and > gotten all of the performance benefit of eliding the clobber. But with > optimization disabled, we don't need the split for correctness. > > > +(define_insn "kunpckhi" > > + [(set (match_operand:HI 0 "register_operand" "=Yk") > > + (ior:HI > > + (ashift:HI > > + (match_operand:HI 1 "register_operand" "Yk") > > + (const_int 8)) > > + (zero_extend:HI (subreg:QI (match_operand:HI 2 "register_operand" > > "Yk") 0] > > + "TARGET_AVX512F" > > + "kunpckbw\t{%2, %1, %0|%0, %1, %2}" > > + [(set_attr "mode" "HI") > > + (set_attr "type" "msklog") > > + (set_attr "prefix" "vex")]) > > Don't write the subreg explicitly. Instead, use a match_operand:QI, which > will > match the whole (subreg (reg)) expression, and also something that the > combiner > could simplify out of that. Thanks, fixed. > > +(define_insn "*one_cmplhi2_1" > > + [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yk") > > + (not:HI (match_operand:HI 1 "nonimmediate_operand" "0,Yk")))] > > + "ix86_unary_operator_ok (NOT, HImode, operands)" > ... > > (define_insn "*one_cmplqi2_1" > > - [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r") > > - (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))] > > + [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,*Yk") > > + (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,*Yk")))] > > Forgotten ! for Yk alternatives. Thanks. Fixed. > > + "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])" > ... > > +;; Do not split instructions with mask registers. > > (define_split > ... > > + && (! ANY_MASK_REG_P (operands[0]) > > + || ! ANY_MASK_REG_P (operands[1]) > > + || ! ANY_MASK_REG_P (operands[2]))" > > This ugliness is why I suggested adding a general_reg_operand in our last > conversation. If introduced general_reg_operand predicate. Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. Is it ok? -- Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 34 +++-- gcc/config/i386/i386.h | 40 -- gcc/config/i386/i386.md| 280 ++--- gcc/config/i386/predicates.md | 9 ++
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Problem is that some iterators are depend on each other, so patches are not going to be tiny. Here is 1st one. It extends VF iterator - biggest impact I believe Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is - I am going to strip out ChangeLog lines from big patch --- gcc/config/i386/i386.c | 62 +-- gcc/config/i386/i386.md | 1 + gcc/config/i386/sse.md | 283 +++- 3 files changed, 241 insertions(+), 105 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8325919..5f50533 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -16538,8 +16538,8 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx op1) gcc_unreachable (); case V32QImode: extract = gen_avx_vextractf128v32qi; - load_unaligned = gen_avx_loaddqu256; - store_unaligned = gen_avx_storedqu256; + load_unaligned = gen_avx_loaddquv32qi; + store_unaligned = gen_avx_storedquv32qi; mode = V16QImode; break; case V8SFmode: @@ -16642,10 +16642,56 @@ void ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) { rtx op0, op1, m; + rtx (*load_unaligned) (rtx, rtx); + rtx (*store_unaligned) (rtx, rtx); op0 = operands[0]; op1 = operands[1]; + if (GET_MODE_SIZE (mode) == 64) +{ + switch (GET_MODE_CLASS (mode)) + { + case MODE_VECTOR_INT: + case MODE_INT: + op0 = gen_lowpart (V16SImode, op0); + op1 = gen_lowpart (V16SImode, op1); + /* FALLTHRU */ + + case MODE_VECTOR_FLOAT: + switch (GET_MODE (op0)) + { + default: + gcc_unreachable (); + case V16SImode: + load_unaligned = gen_avx512f_loaddquv16si; + store_unaligned = gen_avx512f_storedquv16si; + break; + case V16SFmode: + load_unaligned = gen_avx512f_loadups512; + store_unaligned = gen_avx512f_storeups512; + break; + case V8DFmode: + load_unaligned = gen_avx512f_loadupd512; + store_unaligned = gen_avx512f_storeupd512; + break; + } + + if (MEM_P (op1)) + emit_insn (load_unaligned (op0, op1)); + else if (MEM_P (op0)) + emit_insn (store_unaligned (op0, op1)); + else + gcc_unreachable (); + break; + + default: + gcc_unreachable (); + } + + return; +} + if (TARGET_AVX && GET_MODE_SIZE (mode) == 32) { @@ -16678,7 +16724,7 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_loaddqu (op0, op1)); + emit_insn (gen_sse2_loaddquv16qi (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -16753,7 +16799,7 @@ ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) op0 = gen_lowpart (V16QImode, op0); op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ - emit_insn (gen_sse2_storedqu (op0, op1)); + emit_insn (gen_sse2_storedquv16qi (op0, op1)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -27473,13 +27519,13 @@ static const struct builtin_description bdesc_special_args[] = { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_lfence, "__builtin_ia32_lfence", IX86_BUILTIN_LFENCE, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_mfence, 0, IX86_BUILTIN_MFENCE, UNKNOWN, (int) VOID_FTYPE_VOID }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storeupd, "__builtin_ia32_storeupd", IX86_BUILTIN_STOREUPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedqu, "__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_storedquv16qi, "__builtin_ia32_storedqu", IX86_BUILTIN_STOREDQU, UNKNOWN, (int) VOID_FTYPE_PCHAR_V16QI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2df, "__builtin_ia32_movntpd", IX86_BUILTIN_MOVNTPD, UNKNOWN, (int) VOID_FTYPE_PDOUBLE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntv2di, "__builtin_ia32_movntdq", IX86_BUILTIN_MOVNTDQ, UNKNOWN, (int) VOID_FTYPE_PV2DI_V2DI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movntisi, "__builtin_ia32_movnti", IX86_BUILTIN_MOVNTI, UNKNOWN
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
On 27 Aug 22:11, Kirill Yukhin wrote: Hello, I've while pasting the patch I've accidentally put extra brace. Pls Ignore it > +(define_insn "kxnor" > + [(set (match_operand:SWI12 0 "register_operand" "=r,!k") > + (not:SWI12 > + (xor:SWI12 > + (match_operand:SWI12 1 "register_operand" "0,k")) ;; <--- Extra ')' > + (match_operand:SWI12 2 "register_operand" "r,k")))] > + "TARGET_AVX512F" -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello Richard, On 27 Aug 13:07, Richard Henderson wrote: > On 08/27/2013 11:11 AM, Kirill Yukhin wrote: > >> > What happened to the bmi andn alternative we discussed? > > BMI only supported for 4- and 8- byte integers, while > > kandw - for HI/QI > > > > We're talking about values in registers. Ignoring the high bits of the andn > result still produces the correct results. I've updated patch, adding BMI alternative and clobber of flags: +(define_insn "kandn" + [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k") + (and:SWI12 + (not:SWI12 + (match_operand:SWI12 1 "register_operand" "r,0,k")) + (match_operand:SWI12 2 "register_operand" "r,r,k"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_AVX512F" + "@ + andn\t{%k2, %k1, %k0|%k0, %k1, %k2} + # + kandnw\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "bitmanip,*,msklog") + (set_attr "prefix" "*,*,vex") + (set_attr "btver2_decode" "direct,*,*") + (set_attr "mode" "")]) However I am not fully understand why do we need this. `kandn' is different from BMI `andn' in clobbering of flags reg. So, having such a pattern we'll make compiler think that `kandn' clobber, which seems to me like opportunity to misoptimization as far as `kandn' doesn't clobber. Anyway, it seems to work. Testing: 1. Bootstrap pass 2. make check shows no regressions 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option Is it ok? --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 34 - gcc/config/i386/i386.h | 40 -- gcc/config/i386/i386.md| 283 ++--- gcc/config/i386/predicates.md | 9 ++ 5 files changed, 314 insertions(+), 60 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" +"@internal Any mask register.") + ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" "Any MMX register.") diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d05dbf0..8325919 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] = EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, + /* Mask registers. */ + MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, + MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, }; /* The "default" register map used in 32bit mode. */ @@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* The "default" register map used in 64bit mode. */ @@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] = 25, 26, 27, 28, 29, 30, 31, 32, /* extended SSE registers */ 67, 68, 69, 70, 71, 72, 73, 74, /* AVX-512 registers 16-23 */ 75, 76, 77, 78, 79, 80, 81, 82, /* AVX-512 registers 24-31 */ + 118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */ }; /* Define the register numbers to be used in Dwarf debugging information. @@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100,
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello Richard, On 28 Aug 10:55, Richard Henderson wrote: > On 08/28/2013 10:45 AM, Kirill Yukhin wrote: > > Hello Richard, > > > > On 27 Aug 13:07, Richard Henderson wrote: > >> On 08/27/2013 11:11 AM, Kirill Yukhin wrote: > >>>>> What happened to the bmi andn alternative we discussed? > >>> BMI only supported for 4- and 8- byte integers, while > >>> kandw - for HI/QI > >>> > >> > >> We're talking about values in registers. Ignoring the high bits of the > >> andn > >> result still produces the correct results. > > > > However I am not fully understand why do we need this. > > `kandn' is different from BMI `andn' in clobbering of flags reg. > > So, having such a pattern we'll make compiler think that `kandn' > > clobber, which seems to me like opportunity to misoptimization as > > far as `kandn' doesn't clobber. > > This is no different than ANY OTHER use of the mask logical ops. > > When combine puts the AND and the NOT together, we don't know what registers > we > want the data in. If we do not supply the general register alternative, with > the clobber, then we will be FORCED to implement the operation in the mask > registers, even if this operation had nothing to do with actual vector masks. > And it ought to come as no surprise that X & ~Y is a fairly common operation. I agree with all of that. But why to put in BMI alternative as well? Without it me may have this pattern w/o clobber and add it when doing split for GPR constraint. I am just thing that presense of flags clobber in `kandn' pattern is not good from optimization point of view. Anyway I don't think this is big deal... > I suppose a real question here in how this is written: Does TARGET_AVX512F > imply TARGET_BMI? If so, then we can eliminate the second alternative. If > not, then you are missing an set_attr isa to restrict the first alternative. I think that it should be possible to use AVX-512F w/o BMI, so I've added new isa attribute "bmi". I am testing previous patch with that change: @@ -703,7 +703,7 @@ ;; Used to control the "enabled" attribute on a per-instruction basis. (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64, sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx, - avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f" + avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f" (const_string "base")) (define_attr "enabled" "" @@ -726,6 +726,7 @@ (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX") (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2") (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2") +(eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI") (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2") (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4") (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA") @@ -7744,7 +7745,8 @@ andn\t{%k2, %k1, %k0|%k0, %k1, %k2} # kandnw\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "bitmanip,*,msklog") + [(set_attr "isa" "bmi,*,avx512f") + (set_attr "type" "bitmanip,*,msklog") (set_attr "prefix" "*,*,vex") (set_attr "btver2_decode" "direct,*,*") (set_attr "mode" "")]) Full patch below. Ok if testing pass? Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 34 - gcc/config/i386/i386.h | 40 -- gcc/config/i386/i386.md| 287 ++--- gcc/config/i386/predicates.md | 9 ++ 5 files changed, 317 insertions(+), 61 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, On 28 Aug 13:17, Richard Henderson wrote: > Uh, no, you can't just add it when doing the split. You could be adding it in > a place that the flags register is live. You must ALWAYS have the clobber on > the whole pattern when gprs are possible. I see now, thanks a lot for explanation! > Fix the indentation. Done. > Spurious comment change. Done. > > +(define_insn "kandn" > > + [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k") > Yk not k? Yes, sure. Fixed. > > +(define_insn "kxnor" > > + [(set (match_operand:SWI12 0 "register_operand" "=r,!k") > > + (not:SWI12 > Likewise. Fixed. > > +(define_split > > + [(set (match_operand:SWI12 0 "register_operand") > > + (not:SWI12 > > + (xor:SWI12 > general_reg_operand. Done. > > +(define_insn "kortestzhi" > > + [(set (reg:CCZ FLAGS_REG) > > + (compare:CCZ > > + (ior:HI > > + (match_operand:HI 0 "register_operand" "%Yk") > > + (match_operand:HI 1 "register_operand" "Yk")) > > Omit the %; the two operands are identical. Fixed. > > +(define_insn "kortestchi" > > + [(set (reg:CCC FLAGS_REG) > > Likewise. Fixed. > > +;; Do not split instructions with mask regs. > > (define_split > >[(set (match_operand 0 "register_operand") > > (not (match_operand 1 "register_operand")))] > general_reg_operand. Fixed. Testing in progress, is it ok for trunk if pass? -- Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 38 -- gcc/config/i386/i386.h | 38 -- gcc/config/i386/i386.md| 287 ++--- gcc/config/i386/predicates.md | 9 ++ 5 files changed, 317 insertions(+), 63 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" +"@internal Any mask register.") + ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" "Any MMX register.") diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d05dbf0..4e9bac0 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] = EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, + /* Mask registers. */ + MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, + MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, }; /* The "default" register map used in 32bit mode. */ @@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* The "default" register map used in 64bit mode. */ @@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] = 25, 26, 27, 28, 29, 30, 31, 32, /* extended SSE registers */ 67, 68, 69, 70, 71, 72, 73, 74, /* AVX-512 registers 16-23 */ 75, 76, 77, 78, 79, 80, 81, 82, /* AVX-512 registers 24-31 */ + 118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */ }; /* Define the register numbers to be used in Dwarf debugging information. @@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* Define parameter passing and return registers. */ @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void) /* If AVX512F is disabled, squash the registers. */ if (! TARGET_AVX512F) -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) - fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; +{ + for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; + + for (i = FIRST_MASK_REG; i < LA
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, > Testing in progress, is it ok for trunk if pass? I forgot to add clobber to split of andn, so testing fail. Fixed. Updated patch in the bottom. Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. Is it ok for trunk? Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 38 -- gcc/config/i386/i386.h | 38 -- gcc/config/i386/i386.md| 288 ++--- gcc/config/i386/predicates.md | 9 ++ 5 files changed, 318 insertions(+), 63 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" +"@internal Any mask register.") + ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" "Any MMX register.") diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d05dbf0..4e9bac0 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] = EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, + /* Mask registers. */ + MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, + MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, }; /* The "default" register map used in 32bit mode. */ @@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* The "default" register map used in 64bit mode. */ @@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] = 25, 26, 27, 28, 29, 30, 31, 32, /* extended SSE registers */ 67, 68, 69, 70, 71, 72, 73, 74, /* AVX-512 registers 16-23 */ 75, 76, 77, 78, 79, 80, 81, 82, /* AVX-512 registers 24-31 */ + 118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */ }; /* Define the register numbers to be used in Dwarf debugging information. @@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* Define parameter passing and return registers. */ @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void) /* If AVX512F is disabled, squash the registers. */ if (! TARGET_AVX512F) -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) - fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; +{ + for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; + + for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++) + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; +} } @@ -33889,10 +33900,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass) return regclass; /* Force constants into memory if we are loading a (nonzero) constant into - an MMX or SSE register. This is because there are no MMX/SSE instructions - to load from a constant. */ + an MMX, SSE or MASK register. This is because there are no MMX/SSE/MASK + instructions to load from a constant. */ if (CONSTANT_P (x) - && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass))) + && (MAYBE_MMX_CLASS_P (regclass) + || MAYBE_SSE_CLASS_P (regclass) + || MAYBE_MASK_CLASS_P (regclass))) return NO_REGS; /* Prefer SSE regs only, if we can use them for math. */ @@ -33996,10 +34009,11 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t rclass, /* QImode spills from non-QI re
Re: [PATCH i386 1/8] [AVX512] Adjust register classes.
Hello, > This patch [actually the change at 201915] also broke X86 Darwin > bootstrap/ABI: pr59269 > - ISTM that SSE_REGNO_P() now returns true for a different set of registers > than before the patch, > I've attached a starting-point to fix to the PR - but would welcome any > additional inputs folks might have on how best to audit this change. Correct bug id: pr58269 I'll take a look, thanks! -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, PING. -- Thanks, K
Re: [PATCH] Enable non-complex math builtins from C99 for Bionic
Hello, On 04 Sep 20:11, Maxim Kuvyrkov wrote: > On 4/09/2013, at 7:43 PM, Alexander Ivchenko wrote: > The patch is OK with definitions of OPTION_GLIBC, OPTION_UCLIBC and > OPTION_BIONIC copied verbatim from gcc/config/l Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00137.html -- Thanks, K
[i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
Hello, Here is a patch to fix pr58269. Actually this is not a full fix, but an obvious part. ChangeLog entry: 2013-09-06 Kirill Yukhin PR target/58269 * gcc/config/i386/i386.c (ix86_conditional_register_usage): Proper initialize extended SSE registers. Bootstrap pass. Ok for trunk? -- Thanks, K --- gcc/config/i386/i386.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a8d70bc..d6a40a8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4218,7 +4218,7 @@ ix86_conditional_register_usage (void) /* If AVX512F is disabled, squash the registers. */ if (! TARGET_AVX512F) -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) +for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++) fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; } -- 1.7.11.7
Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
Hello, On 06 Sep 11:29, Jakub Jelinek wrote: > On Fri, Sep 06, 2013 at 11:28:53AM +0200, Uros Bizjak wrote: > > This is OK. > > But please leave out gcc/ prefix from the ChangeLog entry. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html with fixed ChangeLog. -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, PING. -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, On 29 Aug 15:59, Kirill Yukhin wrote: > /* Define parameter passing and return registers. */ > @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void) > >/* If AVX512F is disabled, squash the registers. */ >if (! TARGET_AVX512F) > -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) > - fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; > +{ > + for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) > + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; > + > + for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++) > + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; > +} > } This place should be updated as here: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html I am not reposting as the change is obvious. -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, On 04 Sep 22:45, Kirill Yukhin wrote: > Hello, > > PING. PING. -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, On 06 Sep 17:41, Kirill Yukhin wrote: > Hello, > > PING. PING. -- Thanks, K
Re: [PATCH, x86] Use vector moves in memmove expanding
Hello, On 09 Sep 13:50, Uros Bizjak wrote: > OK. Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00286.html -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello Richard, Thanks for inputs. On 09 Sep 10:39, Richard Henderson wrote: > gen_andhi_1 is not used, nor is it likely to be in the future, therefore this > should still have "*". We're using it in patch 6/8 when introducing plugins: + { OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi_1, "__builtin_ia32_kandhi", IX86_BUILTIN_KAND16, UNKNOWN, (int) HI_FTYPE_HI_HI }, And covered by tests in patch 8/8: new file mode 100644 index 000..3d777c8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-kandnw-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-times "kandnw\[ \\t\]+\[^\n\]*%k\[1-7\]" 1 } } */ + +#include + +void +avx512f_test () +{ + __mmask16 k1, k2, k3; + volatile __m512 x; + + __asm__( "kmovw %1, %0" : "=k" (k1) : "r" (1) ); + __asm__( "kmovw %1, %0" : "=k" (k2) : "r" (2) ); + + k3 = _mm512_kandn (k1, k2); + x = _mm512_mask_add_ps (x, k3, x, x); +} > > > +(define_insn "hi_1" > > + [(set (match_operand:HI 0 "nonimmediate_operand" "=r,rm,!Yk") > > + (any_or:HI > > +(match_operand:HI 1 "nonimmediate_operand" "%0,0,Yk") > > +(match_operand:HI 2 "general_operand" ",r,Yk"))) > > + (clobber (reg:CC FLAGS_REG))] > > Likewise. Same as above. Do you still think we need "*"? -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, On 10 Sep 09:17, Richard Henderson wrote: > On 09/10/2013 05:57 AM, Kirill Yukhin wrote: > > + { OPTION_MASK_ISA_AVX512F, CODE_FOR_andhi_1, "__builtin_ia32_kandhi", > > IX86_BUILTIN_KAND16, UNKNOWN, (int) HI_FTYPE_HI_HI }, > > Alternately, why not use the standard CODE_FOR_andhi3 expander? Great point! Thanks, fixed. gcc.target/i386/avx512f-k* tests still pass. Bootstrap pass. Is it ok now? Thanks, K --- gcc/config/i386/constraints.md | 8 +- gcc/config/i386/i386.c | 38 -- gcc/config/i386/i386.h | 38 -- gcc/config/i386/i386.md| 286 ++--- gcc/config/i386/predicates.md | 9 ++ 5 files changed, 317 insertions(+), 62 deletions(-) diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md index 28e626f..92e0c05 100644 --- a/gcc/config/i386/constraints.md +++ b/gcc/config/i386/constraints.md @@ -19,7 +19,7 @@ ;;; Unused letters: ;;; B H T -;;; h jk +;;; h j ;; Integer register constraints. ;; It is not necessary to define 'r' here. @@ -78,6 +78,12 @@ "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS" "Second from top of 80387 floating-point stack (@code{%st(1)}).") +(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS" +"@internal Any mask register that can be used as predicate, i.e. k1-k7.") + +(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS" +"@internal Any mask register.") + ;; Vector registers (also used for plain floating point nowadays). (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS" "Any MMX register.") diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index fe9a714..72549e9 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] = EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, + /* Mask registers. */ + MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, + MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, }; /* The "default" register map used in 32bit mode. */ @@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* The "default" register map used in 64bit mode. */ @@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] = 25, 26, 27, 28, 29, 30, 31, 32, /* extended SSE registers */ 67, 68, 69, 70, 71, 72, 73, 74, /* AVX-512 registers 16-23 */ 75, 76, 77, 78, 79, 80, 81, 82, /* AVX-512 registers 24-31 */ + 118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */ }; /* Define the register numbers to be used in Dwarf debugging information. @@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] = -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 16-23*/ -1, -1, -1, -1, -1, -1, -1, -1, /* AVX-512 registers 24-31*/ + 93, 94, 95, 96, 97, 98, 99, 100, /* Mask registers */ }; /* Define parameter passing and return registers. */ @@ -4224,8 +4230,13 @@ ix86_conditional_register_usage (void) /* If AVX512F is disabled, squash the registers. */ if (! TARGET_AVX512F) -for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++) - fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; +{ + for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++) + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; + + for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++) + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; +} } @@ -33918,10 +33929,12 @@ ix86_preferred_reload_class (rtx x, reg_class_t regclass) return regclass; /* Force constants into memory if we are loading a (nonzero) constant into - an MMX or SSE register. This is because there are no MMX/SSE instructions - to load from a constant. */ + an MMX, SSE or MASK register. This is because there are no MMX/SSE/MASK + instructions to load from a constant. */ if (CONSTANT_P (x) - && (MAYBE_MMX_CLASS_P (regclass) || MAYBE_SSE_CLASS_P (regclass))) + && (MAYBE_MMX_CLASS_P (regclass) + || MAYBE_SSE_CLASS_P (regclass) + || MAYBE_MASK_CLASS_P (r
Re: [i386, doc] Add documentation for fxsr, xsave, xsaveopt
Hello, On 10 Sep 19:42, Uros Bizjak wrote: > The patch is OK for mainline. Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00353.html -- Thanks, K
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, On 10 Sep 11:51, Richard Henderson wrote: > On 09/10/2013 11:25 AM, Kirill Yukhin wrote: > > Is it ok now? > > > Yes. Thanks a lot! Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00354.html -- Thanks, K
Re: [x86,PATCH] Simple performance tuning for SLM.
Hello, On 10 Sep 19:32, Uros Bizjak wrote: > On Tue, Sep 10, 2013 at 4:56 PM, Yuri Rumyantsev wrote: > > Is it OK for trunk? > OK. Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00383.html -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, On 09 Sep 15:11, Kirill Yukhin wrote: > Hello, > On 06 Sep 17:41, Kirill Yukhin wrote: > > Hello, > > > > PING. > PING. PING. -- Thanks, K
Re: [libvtv] Remove Android from supported targets
Hello, On 12 Sep 08:19, Caroline Tice wrote: > Yes, that patch is ok. > > -- Caroline Tice > cmt...@google.com > > On Thu, Sep 12, 2013 at 3:56 AM, Alexander Ivchenko > wrote: > > > > Is following patch OK? Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00425.html -- Thanks, K
Re: [x86,PATCH] Simple fix for Atom LEA splitting.
Hello, On 16 Sep 16:36, Uros Bizjak wrote: > The patch with a fixed comment is OK otherwise. Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00512.html -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, On 13 Sep 14:28, Kirill Yukhin wrote: > Hello, > On 09 Sep 15:11, Kirill Yukhin wrote: > > Hello, > > On 06 Sep 17:41, Kirill Yukhin wrote: > > > Hello, > > > > > > PING. > > PING. > PING. PING -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, On 18 Sep 11:17, Kirill Yukhin wrote: > Hello, > On 13 Sep 14:28, Kirill Yukhin wrote: > > Hello, > > On 09 Sep 15:11, Kirill Yukhin wrote: > > > Hello, > > > On 06 Sep 17:41, Kirill Yukhin wrote: > > > > Hello, > > > > > > > > PING. > > > PING. > > PING. > PING PING. -- Thanks, K
Re: [PATCH] Fix instability of -fschedule-insn for x86
Hi, > Based on this opinion, the patch is OK for mainline, if there are no ... Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00187.html K
Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure
Hi guys, >> Is it ok for release it into trunk and 4.7? > > Yes, please do so. Checked into trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00419.html and 4.7: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00431.html Thanks, K
Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure
Reverted. Trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00516.html 4.7: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00517.html K
Re: [PATCH, libstdc++] Fix missing gthr-default.h issue on libstdc++ configure
>> Looks Ok. If David can test is successfully on AIX I can approve it. > > I was able to bootstrap successfully with the patch. Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00581.html Thanks, K
Re: [PATCH] Intrinsics for fxsave[,64], xsave[,64], xsaveopt[,64]
Hi, > So, the patch is OK for mainline (with -mxsave removed from sse-X tests). > > Please commit the patch to mainline SVN. > Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00963.html Thanks, K
Re: Additional fix for pre-reload schedule on x86 targets.
> If it were not approved yet by an insn scheduler maintainer, it is ok for > me. As Uros wrote that he rubberstamps the patch if a scheduler maintainer > approves it, so you can commit it into the mainline. Checked in: http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00965.html K
Re: [PATCH i386 3/8] [AVX512] [2/n] Add AVX-512 patterns: Fix missing `v' constraint.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 2nd subpatch. It fixes missing `v' constraints. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is ok - I am going to strip out ChangeLog lines from big patch --- gcc/config/i386/sse.md | 34 +- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 10637cc..2f2fb38 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -880,7 +880,7 @@ (define_insn "_movnt" [(set (match_operand:VI8 0 "memory_operand" "=m") - (unspec:VI8 [(match_operand:VI8 1 "register_operand" "x")] + (unspec:VI8 [(match_operand:VI8 1 "register_operand" "v")] UNSPEC_MOVNT))] "TARGET_SSE2" "%vmovntdq\t{%1, %0|%0, %1}" @@ -1764,10 +1764,10 @@ [(set (reg:CCFP FLAGS_REG) (compare:CCFP (vec_select:MODEF - (match_operand: 0 "register_operand" "x") + (match_operand: 0 "register_operand" "v") (parallel [(const_int 0)])) (vec_select:MODEF - (match_operand: 1 "nonimmediate_operand" "xm") + (match_operand: 1 "nonimmediate_operand" "vm") (parallel [(const_int 0)]] "SSE_FLOAT_MODE_P (mode)" "%vcomi\t{%1, %0|%0, %1}" @@ -1784,10 +1784,10 @@ [(set (reg:CCFPU FLAGS_REG) (compare:CCFPU (vec_select:MODEF - (match_operand: 0 "register_operand" "x") + (match_operand: 0 "register_operand" "v") (parallel [(const_int 0)])) (vec_select:MODEF - (match_operand: 1 "nonimmediate_operand" "xm") + (match_operand: 1 "nonimmediate_operand" "vm") (parallel [(const_int 0)]] "SSE_FLOAT_MODE_P (mode)" "%vucomi\t{%1, %0|%0, %1}" @@ -2594,7 +2594,7 @@ (set_attr "amdfam10_decode" "vector,double,*") (set_attr "bdver1_decode" "double,direct,*") (set_attr "btver2_decode" "double,double,double") - (set_attr "prefix" "orig,orig,vex") + (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "SF")]) (define_insn "sse_cvtsi2ssq" @@ -2617,7 +2617,7 @@ (set_attr "btver2_decode" "double,double,double") (set_attr "length_vex" "*,*,4") (set_attr "prefix_rex" "1,1,*") - (set_attr "prefix" "orig,orig,vex") + (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "SF")]) (define_insn "sse_cvtss2si" @@ -2668,7 +2668,7 @@ (define_insn "sse_cvtss2siq_2" [(set (match_operand:DI 0 "register_operand" "=r,r") - (unspec:DI [(match_operand:SF 1 "nonimmediate_operand" "x,m")] + (unspec:DI [(match_operand:SF 1 "nonimmediate_operand" "v,m")] UNSPEC_FIX_NOTRUNC))] "TARGET_SSE && TARGET_64BIT" "%vcvtss2si{q}\t{%1, %0|%0, %k1}" @@ -2860,11 +2860,11 @@ (set_attr "mode" "DF")]) (define_insn "sse2_cvtsi2sdq" - [(set (match_operand:V2DF 0 "register_operand" "=x,x,x") + [(set (match_operand:V2DF 0 "register_operand" "=x,x,v") (vec_merge:V2DF (vec_duplicate:V2DF (float:DF (match_operand:DI 2 "nonimmediate_operand" "r,m,rm"))) - (match_operand:V2DF 1 "register_operand" "0,0,x") + (match_operand:V2DF 1 "register_operand" "0,0,v") (const_int 1)))] "TARGET_SSE2 && TARGET_64BIT" "@ @@ -2878,14 +2878,14 @@ (set_attr "bdver1_decode" "double,direct,*") (set_attr "length_vex" "*,*,4") (set_attr "prefix_rex" "1,1,*") - (set_attr "prefix" "orig,orig,vex") + (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "DF")]) (define_insn "sse2_cvtsd2si" [(set (match_operand:SI 0 "register_operand" "=r,r") (unspec:SI [(vec_select:DF -(match_operand:V2DF 1 "nonimmediate_operand" "x,m") +(match_operand:V2DF 1 "nonimmediate_operand" "v,m") (parallel [(const_int 0)]))] UNSPEC_FIX_NOTRUNC))] "TARGET_SSE2" @@ -2916,7 +2916,7 @@ [(set (match_operand:DI 0 "register_operand" "=r,r") (unspec:DI [(vec_select:DF -(match_operand:V2DF 1 "nonimmediate_operand" "x,m") +(match_operand:V2DF 1 "nonimmediate_operand" "v,m") (parallel [(const_int 0)]))] UNSPEC_FIX_NOTRUNC))] "TARGET_SSE2 && TARGET_64BIT" @@ -2946,7 +2946,7 @@ [(set (match_operand:SI 0 "register_operand" "=r,r") (fix:SI (vec_select:DF - (match_operand:V2DF 1 "nonimmediate_operand" "x,m") + (match_operand:V2DF 1 "nonimmediate_operand" "v,m") (parallel [(const_int 0)]] "TARGET_SSE2" "%vcvttsd2si\t{%1, %0|%0, %q1}" @@ -2963,7 +2963,7 @@ [(set (match_o
Re: [PATCH i386 3/8] [AVX512] [3/n] Add AVX-512 patterns: VF1 and VI iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 3rd subpatch. It extends VF1 and VI iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/i386.md | 4 ++ gcc/config/i386/sse.md | 117 +++- 2 files changed, 79 insertions(+), 42 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 03b3842..cc332ea 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -827,6 +827,10 @@ (define_code_attr s [(sign_extend "s") (zero_extend "u")]) (define_code_attr u_bool [(sign_extend "false") (zero_extend "true")]) +;; Used in signed and unsigned fix. +(define_code_iterator any_fix [fix unsigned_fix]) +(define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) + ;; All integer modes. (define_mode_iterator SWI1248x [QI HI SI DI]) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2f2fb38..aa9f1d1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -134,6 +134,10 @@ ;; All SFmode vector float modes (define_mode_iterator VF1 + [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF]) + +;; 128- and 256-bit SF vector modes +(define_mode_iterator VF1_128_256 [(V8SF "TARGET_AVX") V4SF]) ;; All DFmode vector float modes @@ -154,7 +158,8 @@ ;; All vector integer modes (define_mode_iterator VI - [(V32QI "TARGET_AVX") V16QI + [(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") + (V32QI "TARGET_AVX") V16QI (V16HI "TARGET_AVX") V8HI (V8SI "TARGET_AVX") V4SI (V4DI "TARGET_AVX") V2DI]) @@ -162,8 +167,8 @@ (define_mode_iterator VI_AVX2 [(V32QI "TARGET_AVX2") V16QI (V16HI "TARGET_AVX2") V8HI - (V8SI "TARGET_AVX2") V4SI - (V4DI "TARGET_AVX2") V2DI]) + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) ;; All QImode vector integer modes (define_mode_iterator VI1 @@ -175,7 +180,7 @@ ;; All DImode vector integer modes (define_mode_iterator VI8 - [(V4DI "TARGET_AVX") V2DI]) + [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI]) (define_mode_iterator VI1_AVX2 [(V32QI "TARGET_AVX2") V16QI]) @@ -358,7 +363,8 @@ (V32QI "V32QI") (V16QI "V16QI")]) (define_mode_attr sseintvecmodelower - [(V8SF "v8si") (V4DF "v4di") + [(V16SF "v16si") + (V8SF "v8si") (V4DF "v4di") (V4SF "v4si") (V2DF "v2di") (V8SI "v8si") (V4DI "v4di") (V4SI "v4si") (V2DI "v2di") @@ -393,10 +399,21 @@ ;; Mapping of vector modes back to the scalar modes (define_mode_attr ssescalarmode - [(V32QI "QI") (V16HI "HI") (V8SI "SI") (V4DI "DI") - (V16QI "QI") (V8HI "HI") (V4SI "SI") (V2DI "DI") - (V8SF "SF") (V4DF "DF") - (V4SF "SF") (V2DF "DF")]) + [(V64QI "QI") (V32QI "QI") (V16QI "QI") + (V32HI "HI") (V16HI "HI") (V8HI "HI") + (V16SI "SI") (V8SI "SI") (V4SI "SI") + (V8DI "DI") (V4DI "DI") (V2DI "DI") + (V16SF "SF") (V8SF "SF") (V4SF "SF") + (V8DF "DF") (V4DF "DF") (V2DF "DF")]) + +;; Mapping of vector modes to the 128bit modes +(define_mode_attr ssexmmmode + [(V64QI "V16QI") (V32QI "V16QI") (V16QI "V16QI") + (V32HI "V8HI") (V16HI "V8HI") (V8HI "V8HI") + (V16SI "V4SI") (V8SI "V4SI") (V4SI "V4SI") + (V8DI "V2DI") (V4DI "V2DI") (V2DI "V2DI") + (V16SF "V4SF") (V8SF "V4SF") (V4SF "V4SF") + (V8DF "V2DF") (V4DF "V2DF") (V2DF "V2DF")]) ;; Pointer size override for scalar modes (Intel asm dialect) (define_mode_attr iptr @@ -408,8 +425,10 @@ ;; Number of scalar elements in each vector type (define_mode_attr ssescalarnum - [(V32QI "32") (V16HI "16") (V8SI "8") (V4DI "4") + [(V64QI "64") (V16SI "16") (V8DI "8") + (V32QI "32") (V16HI "16") (V8SI "8") (V4DI "4") (V16QI "16") (V8HI "8") (V4SI "4") (V2DI "2") + (V16SF "16") (V8DF "8") (V8SF "8") (V4DF "4") (V4SF "4") (V2DF "2")]) @@ -1101,9 +1120,9 @@ (set_attr "mode" "")]) (define_insn "_rcp2" - [(set (match_operand:VF1 0 "register_operand" "=x") - (unspec:VF1 - [(match_operand:VF1 1 "nonimmediate_operand" "xm")] UNSPEC_RCP))] + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (unspec:VF1_128_256 + [(match_operand:VF1_128_256 1 "nonimmediate_operand" "xm")] UNSPEC_RCP))] "TARGET_SSE" "%vrcpps\t{%1, %0|%0, %1}" [(set_attr "type" "sse") @@ -1181,9 +1200,9 @@ (set_attr "mode" "")]) (define_expand "rsqrt2" - [(set (match_operand:VF1 0 "register_operand") - (unspec:VF1 - [(match_operand:VF1 1 "nonimmediate_operand")] UNSPEC_RSQRT))] + [(set (match_operand:VF1_128_256 0 "register_operand") + (unspec:VF1_128_256 +
Re: [PATCH i386 3/8] [AVX512] [4/n] Add AVX-512 patterns: V iterator.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 4th subpatch. It extends V iterator. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 175 - 1 file changed, 131 insertions(+), 44 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index aa9f1d1..cdb9ae0 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -109,10 +109,10 @@ (define_mode_iterator V [(V32QI "TARGET_AVX") V16QI (V16HI "TARGET_AVX") V8HI - (V8SI "TARGET_AVX") V4SI - (V4DI "TARGET_AVX") V2DI - (V8SF "TARGET_AVX") V4SF - (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX") V4SI + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX") V2DI + (V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF + (V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") (V2DF "TARGET_SSE2")]) ;; All 128bit vector modes (define_mode_iterator V_128 @@ -122,6 +122,12 @@ (define_mode_iterator V_256 [V32QI V16HI V8SI V4DI V8SF V4DF]) +;; All 256bit and 512bit vector modes +(define_mode_iterator V_256_512 + [V32QI V16HI V8SI V4DI V8SF V4DF + (V64QI "TARGET_AVX512F") (V32HI "TARGET_AVX512F") (V16SI "TARGET_AVX512F") + (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) + ;; All vector float modes (define_mode_iterator VF [(V16SF "TARGET_AVX512F") (V8SF "TARGET_AVX") V4SF @@ -142,8 +148,15 @@ ;; All DFmode vector float modes (define_mode_iterator VF2 + [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX") V2DF]) + +;; 128- and 256-bit DF vector modes +(define_mode_iterator VF2_128_256 [(V4DF "TARGET_AVX") V2DF]) +(define_mode_iterator VF2_512_256 + [(V8DF "TARGET_AVX512F") (V4DF "TARGET_AVX")]) + ;; All 128bit vector float modes (define_mode_iterator VF_128 [V4SF (V2DF "TARGET_SSE2")]) @@ -380,10 +393,12 @@ ;; Mapping of vector modes to a vector mode of half size (define_mode_attr ssehalfvecmode - [(V32QI "V16QI") (V16HI "V8HI") (V8SI "V4SI") (V4DI "V2DI") - (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") - (V8SF "V4SF") (V4DF "V2DF") - (V4SF "V2SF")]) + [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI") + (V32QI "V16QI") (V16HI "V8HI") (V8SI "V4SI") (V4DI "V2DI") + (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") + (V16SF "V8SF") (V8DF "V4DF") + (V8SF "V4SF") (V4DF "V2DF") + (V4SF "V2SF")]) ;; Mapping of vector modes ti packed single mode of the same size (define_mode_attr ssePSmode @@ -474,9 +489,11 @@ (define_code_attr extsuffix [(sign_extend "sx") (zero_extend "zx")]) ;; i128 for integer vectors and TARGET_AVX2, f128 otherwise. +;; i64x4 or f64x4 for 512bit modes. (define_mode_attr i128 - [(V8SF "f128") (V4DF "f128") (V32QI "%~128") (V16HI "%~128") - (V8SI "%~128") (V4DI "%~128")]) + [(V16SF "f64x4") (V8SF "f128") (V8DF "f64x4") (V4DF "f128") + (V64QI "i64x4") (V32QI "%~128") (V32HI "i64x4") (V16HI "%~128") + (V16SI "i64x4") (V8SI "%~128") (V8DI "i64x4") (V4DI "%~128")]) ;; Mix-n-match (define_mode_iterator AVX256MODE2P [V8SI V8SF V4DF]) @@ -3004,14 +3021,20 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) -(define_insn "floatv4siv4df2" - [(set (match_operand:V4DF 0 "register_operand" "=x") - (float:V4DF (match_operand:V4SI 1 "nonimmediate_operand" "xm")))] +;; For float2 insn pattern +(define_mode_attr si2dfmode + [(V8DF "V8SI") (V4DF "V4SI")]) +(define_mode_attr si2dfmodelower + [(V8DF "v8si") (V4DF "v4si")]) + +(define_insn "float2" + [(set (match_operand:VF2_512_256 0 "register_operand" "=v") + (float:VF2_512_256 (match_operand: 1 "nonimmediate_operand" "vm")))] "TARGET_AVX" "vcvtdq2pd\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") - (set_attr "prefix" "vex") - (set_attr "mode" "V4DF")]) + (set_attr "prefix" "maybe_vex") + (set_attr "mode" "")]) (define_insn "avx_cvtdq2pd256_2" [(set (match_operand:V4DF 0 "register_operand" "=x") @@ -3101,6 +3124,15 @@ (set_attr "athlon_decode" "vector") (set_attr "bdver1_decode" "double")]) +(define_insn "fix_truncv8dfv8si2" + [(set (match_operand:V8SI 0 "register_operand" "=v") + (any_fix:V8SI (match_operand:V8DF 1 "nonimmediate_operand" "vm")))] + "TARGET_AVX512F" + "vcvttpd2dq\t{%1, %0|%0, %1}" + [(set_attr "type" "ssecvt") + (set_attr "prefix" "evex") + (set_attr "mode" "OI")]) + (define_insn "fix_truncv4dfv4si2" [(set (match_operand:V4SI 0 "register_operand" "=x") (fix:V4SI (match_operand:V4DF 1 "nonimmediate_operand" "xm")))] @@ -3243,15 +3275,19 @@ (set_attr "prefix" "maybe_vex")
Re: [PATCH i386 3/8] [AVX512] [6/n] Add AVX-512 patterns: VI2 and VI124 iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 6th subpatch. It extends VI2 and VI124 iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 89c31c5..351f5bb 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -201,6 +201,9 @@ (define_mode_iterator VI2_AVX2 [(V16HI "TARGET_AVX2") V8HI]) +(define_mode_iterator VI2_AVX512F + [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI]) + (define_mode_iterator VI4_AVX2 [(V8SI "TARGET_AVX2") V4SI]) @@ -223,6 +226,11 @@ [(V16HI "TARGET_AVX2") V8HI (V8SI "TARGET_AVX2") V4SI]) +(define_mode_iterator VI124_AVX512F + [(V32QI "TARGET_AVX2") V16QI + (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI]) + (define_mode_iterator VI124_AVX2 [(V32QI "TARGET_AVX2") V16QI (V16HI "TARGET_AVX2") V8HI @@ -472,7 +480,8 @@ ;; Pack/unpack vector modes (define_mode_attr sseunpackmode [(V16QI "V8HI") (V8HI "V4SI") (V4SI "V2DI") - (V32QI "V16HI") (V16HI "V8SI") (V8SI "V4DI")]) + (V32QI "V16HI") (V16HI "V8SI") (V8SI "V4DI") + (V32HI "V16SI") (V64QI "V32HI") (V16SI "V8DI")]) (define_mode_attr ssepackmode [(V8HI "V16QI") (V4SI "V8HI") (V2DI "V4SI") @@ -3347,11 +3356,12 @@ "TARGET_AVX") (define_mode_attr sseunpackfltmode - [(V8HI "V4SF") (V4SI "V2DF") (V16HI "V8SF") (V8SI "V4DF")]) + [(V8HI "V4SF") (V4SI "V2DF") (V16HI "V8SF") + (V8SI "V4DF") (V32HI "V16SF") (V16SI "V8DF")]) (define_expand "vec_unpacks_float_hi_" [(match_operand: 0 "register_operand") - (match_operand:VI2_AVX2 1 "register_operand")] + (match_operand:VI2_AVX512F 1 "register_operand")] "TARGET_SSE2" { rtx tmp = gen_reg_rtx (mode); @@ -3364,7 +3374,7 @@ (define_expand "vec_unpacks_float_lo_" [(match_operand: 0 "register_operand") - (match_operand:VI2_AVX2 1 "register_operand")] + (match_operand:VI2_AVX512F 1 "register_operand")] "TARGET_SSE2" { rtx tmp = gen_reg_rtx (mode); @@ -3377,7 +3387,7 @@ (define_expand "vec_unpacku_float_hi_" [(match_operand: 0 "register_operand") - (match_operand:VI2_AVX2 1 "register_operand")] + (match_operand:VI2_AVX512F 1 "register_operand")] "TARGET_SSE2" { rtx tmp = gen_reg_rtx (mode); @@ -3390,7 +3400,7 @@ (define_expand "vec_unpacku_float_lo_" [(match_operand: 0 "register_operand") - (match_operand:VI2_AVX2 1 "register_operand")] + (match_operand:VI2_AVX512F 1 "register_operand")] "TARGET_SSE2" { rtx tmp = gen_reg_rtx (mode); @@ -7835,25 +7845,25 @@ (define_expand "vec_unpacks_lo_" [(match_operand: 0 "register_operand") - (match_operand:VI124_AVX2 1 "register_operand")] + (match_operand:VI124_AVX512F 1 "register_operand")] "TARGET_SSE2" "ix86_expand_sse_unpack (operands[0], operands[1], false, false); DONE;") (define_expand "vec_unpacks_hi_" [(match_operand: 0 "register_operand") - (match_operand:VI124_AVX2 1 "register_operand")] + (match_operand:VI124_AVX512F 1 "register_operand")] "TARGET_SSE2" "ix86_expand_sse_unpack (operands[0], operands[1], false, true); DONE;") (define_expand "vec_unpacku_lo_" [(match_operand: 0 "register_operand") - (match_operand:VI124_AVX2 1 "register_operand")] + (match_operand:VI124_AVX512F 1 "register_operand")] "TARGET_SSE2" "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;") (define_expand "vec_unpacku_hi_" [(match_operand: 0 "register_operand") - (match_operand:VI124_AVX2 1 "register_operand")] + (match_operand:VI124_AVX512F 1 "register_operand")] "TARGET_SSE2" "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;") -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [5/n] Add AVX-512 patterns: Introduce `multdiv' code iterator.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 5th subpatch. It introduces `multdiv' code iterator. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/i386.md | 4 gcc/config/i386/sse.md | 31 +++ 2 files changed, 11 insertions(+), 24 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index cc332ea..10ca6cb 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -746,6 +746,8 @@ (define_code_iterator sat_plusminus [ss_plus us_plus ss_minus us_minus]) +(define_code_iterator multdiv [mult div]) + ;; Base name for define_insn (define_code_attr plusminus_insn [(plus "add") (ss_plus "ssadd") (us_plus "usadd") @@ -757,6 +759,8 @@ (minus "sub") (ss_minus "subs") (us_minus "subus")]) (define_code_attr plusminus_carry_mnemonic [(plus "adc") (minus "sbb")]) +(define_code_attr multdiv_mnemonic + [(mult "mul") (div "div")]) ;; Mark commutative operators as such in constraints. (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index cdb9ae0..89c31c5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1061,21 +1061,22 @@ (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) -(define_insn "_vmmul3" +(define_insn "_vm3" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 - (mult:VF_128 + (multdiv:VF_128 (match_operand:VF_128 1 "register_operand" "0,v") (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm")) (match_dup 1) (const_int 1)))] "TARGET_SSE" "@ - mul\t{%2, %0|%0, %2} - vmul\t{%2, %1, %0|%0, %1, %2}" + \t{%2, %0|%0, %2} + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") - (set_attr "type" "ssemul") - (set_attr "prefix" "orig,vex") + (set_attr "type" "sse") + (set_attr "prefix" "orig,maybe_evex") + (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) (define_expand "div3" @@ -1118,24 +1119,6 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_insn "_vmdiv3" - [(set (match_operand:VF_128 0 "register_operand" "=x,v") - (vec_merge:VF_128 - (div:VF_128 - (match_operand:VF_128 1 "register_operand" "0,v") - (match_operand:VF_128 2 "nonimmediate_operand" "xm,vm")) - (match_dup 1) - (const_int 1)))] - "TARGET_SSE" - "@ - div\t{%2, %0|%0, %2} - vdiv\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "isa" "noavx,avx") - (set_attr "type" "ssediv") - (set_attr "prefix" "orig,vex") - (set_attr "btver2_decode" "direct,double") - (set_attr "mode" "")]) - (define_insn "_rcp2" [(set (match_operand:VF1_128_256 0 "register_operand" "=x") (unspec:VF1_128_256 -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [7/n] Add AVX-512 patterns: VI4 and VI8 iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 7th subpatch. It extends VI4 and VI8 iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 33 - 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 351f5bb..127ecf2 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -207,9 +207,15 @@ (define_mode_iterator VI4_AVX2 [(V8SI "TARGET_AVX2") V4SI]) +(define_mode_iterator VI4_AVX512F + [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI]) + (define_mode_iterator VI8_AVX2 [(V4DI "TARGET_AVX2") V2DI]) +(define_mode_iterator VI8_AVX2_AVX512F + [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) + ;; ??? We should probably use TImode instead. (define_mode_iterator VIMAX_AVX2 [(V2TI "TARGET_AVX2") V1TI]) @@ -5854,10 +5860,10 @@ (set_attr "mode" "TI")]) (define_expand "mul3" - [(set (match_operand:VI4_AVX2 0 "register_operand") - (mult:VI4_AVX2 - (match_operand:VI4_AVX2 1 "general_vector_operand") - (match_operand:VI4_AVX2 2 "general_vector_operand")))] + [(set (match_operand:VI4_AVX512F 0 "register_operand") + (mult:VI4_AVX512F + (match_operand:VI4_AVX512F 1 "general_vector_operand") + (match_operand:VI4_AVX512F 2 "general_vector_operand")))] "TARGET_SSE2" { if (TARGET_SSE4_1) @@ -5876,10 +5882,10 @@ }) (define_insn "*_mul3" - [(set (match_operand:VI4_AVX2 0 "register_operand" "=x,v") - (mult:VI4_AVX2 - (match_operand:VI4_AVX2 1 "nonimmediate_operand" "%0,v") - (match_operand:VI4_AVX2 2 "nonimmediate_operand" "xm,vm")))] + [(set (match_operand:VI4_AVX512F 0 "register_operand" "=x,v") + (mult:VI4_AVX512F + (match_operand:VI4_AVX512F 1 "nonimmediate_operand" "%0,v") + (match_operand:VI4_AVX512F 2 "nonimmediate_operand" "xm,vm")))] "TARGET_SSE4_1 && ix86_binary_operator_ok (MULT, mode, operands)" "@ pmulld\t{%2, %0|%0, %2} @@ -5892,9 +5898,10 @@ (set_attr "mode" "")]) (define_expand "mul3" - [(set (match_operand:VI8_AVX2 0 "register_operand") - (mult:VI8_AVX2 (match_operand:VI8_AVX2 1 "register_operand") - (match_operand:VI8_AVX2 2 "register_operand")))] + [(set (match_operand:VI8_AVX2_AVX512F 0 "register_operand") + (mult:VI8_AVX2_AVX512F + (match_operand:VI8_AVX2_AVX512F 1 "register_operand") + (match_operand:VI8_AVX2_AVX512F 2 "register_operand")))] "TARGET_SSE2" { ix86_expand_sse2_mulvxdi3 (operands[0], operands[1], operands[2]); @@ -5941,8 +5948,8 @@ (define_expand "vec_widen_mult_odd_" [(match_operand: 0 "register_operand") (any_extend: - (match_operand:VI4_AVX2 1 "general_vector_operand")) - (match_operand:VI4_AVX2 2 "general_vector_operand")] + (match_operand:VI4_AVX512F 1 "general_vector_operand")) + (match_operand:VI4_AVX512F 2 "general_vector_operand")] "TARGET_SSE2" { ix86_expand_mul_widen_evenodd (operands[0], operands[1], operands[2], -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [8/n] Add AVX-512 patterns: VI48 and VI48_AVX2 iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 8th subpatch. It extends VI48 and VI48_AVX2 iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 127ecf2..49124ba 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -210,6 +210,10 @@ (define_mode_iterator VI4_AVX512F [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI]) +(define_mode_iterator VI48_AVX512F + [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F")]) + (define_mode_iterator VI8_AVX2 [(V4DI "TARGET_AVX2") V2DI]) @@ -247,9 +251,9 @@ (V8SI "TARGET_AVX2") V4SI (V4DI "TARGET_AVX2") V2DI]) -(define_mode_iterator VI48_AVX2 - [(V8SI "TARGET_AVX2") V4SI - (V4DI "TARGET_AVX2") V2DI]) +(define_mode_iterator VI48_AVX2_48_AVX512F + [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) (define_mode_iterator V48_AVX2 [V4SF V2DF @@ -11404,26 +11408,26 @@ DONE; }) -(define_insn "avx2_ashrv" - [(set (match_operand:VI4_AVX2 0 "register_operand" "=v") - (ashiftrt:VI4_AVX2 - (match_operand:VI4_AVX2 1 "register_operand" "v") - (match_operand:VI4_AVX2 2 "nonimmediate_operand" "vm")))] +(define_insn "_ashrv" + [(set (match_operand:VI48_AVX512F 0 "register_operand" "=v") + (ashiftrt:VI48_AVX512F + (match_operand:VI48_AVX512F 1 "register_operand" "v") + (match_operand:VI48_AVX512F 2 "nonimmediate_operand" "vm")))] "TARGET_AVX2" - "vpsravd\t{%2, %1, %0|%0, %1, %2}" + "vpsrav\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseishft") - (set_attr "prefix" "vex") + (set_attr "prefix" "maybe_evex") (set_attr "mode" "")]) -(define_insn "avx2_v" - [(set (match_operand:VI48_AVX2 0 "register_operand" "=v") - (any_lshift:VI48_AVX2 - (match_operand:VI48_AVX2 1 "register_operand" "v") - (match_operand:VI48_AVX2 2 "nonimmediate_operand" "vm")))] +(define_insn "_v" + [(set (match_operand:VI48_AVX2_48_AVX512F 0 "register_operand" "=v") + (any_lshift:VI48_AVX2_48_AVX512F + (match_operand:VI48_AVX2_48_AVX512F 1 "register_operand" "v") + (match_operand:VI48_AVX2_48_AVX512F 2 "nonimmediate_operand" "vm")))] "TARGET_AVX2" "vpv\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseishft") - (set_attr "prefix" "vex") + (set_attr "prefix" "maybe_evex") (set_attr "mode" "")]) ;; For avx_vec_concat insn pattern -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [10/n] Add AVX-512 patterns: VI248_AVX2_8_AVX512F and VI124_256_48_AVX512F iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 10th subpatch. It introduces VI248_AVX2_8_AVX512F and VI124_256_48_512 iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 35 +-- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index dd310b5..a380690 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -257,6 +257,11 @@ (V8SI "TARGET_AVX2") V4SI (V4DI "TARGET_AVX2") V2DI]) +(define_mode_iterator VI248_AVX2_8_AVX512F + [(V16HI "TARGET_AVX2") V8HI + (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) + (define_mode_iterator VI48_AVX2_48_AVX512F [(V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI (V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) @@ -341,8 +346,9 @@ (define_mode_iterator VI248_128 [V8HI V4SI V2DI]) (define_mode_iterator VI48_128 [V4SI V2DI]) -;; Random 256bit vector integer mode combinations -(define_mode_iterator VI124_256 [V32QI V16HI V8SI]) +;; Various 256bit and 512 vector integer mode combinations +(define_mode_iterator VI124_256_48_512 + [V32QI V16HI V8SI (V8DI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")]) (define_mode_iterator VI48_256 [V8SI V4DI]) ;; Int-float size matches @@ -503,7 +509,8 @@ (define_mode_attr ssepackmode [(V8HI "V16QI") (V4SI "V8HI") (V2DI "V4SI") - (V16HI "V32QI") (V8SI "V16HI") (V4DI "V8SI")]) + (V16HI "V32QI") (V8SI "V16HI") (V4DI "V8SI") + (V32HI "V64QI") (V16SI "V32HI") (V8DI "V16SI")]) ;; Mapping of the max integer size for xop rotate immediate constraint (define_mode_attr sserotatemax @@ -6114,23 +6121,23 @@ (define_expand "3" - [(set (match_operand:VI124_256 0 "register_operand") - (maxmin:VI124_256 - (match_operand:VI124_256 1 "nonimmediate_operand") - (match_operand:VI124_256 2 "nonimmediate_operand")))] + [(set (match_operand:VI124_256_48_512 0 "register_operand") + (maxmin:VI124_256_48_512 + (match_operand:VI124_256_48_512 1 "nonimmediate_operand") + (match_operand:VI124_256_48_512 2 "nonimmediate_operand")))] "TARGET_AVX2" "ix86_fixup_binary_operands_no_copy (, mode, operands);") (define_insn "*avx2_3" - [(set (match_operand:VI124_256 0 "register_operand" "=v") - (maxmin:VI124_256 - (match_operand:VI124_256 1 "nonimmediate_operand" "%v") - (match_operand:VI124_256 2 "nonimmediate_operand" "vm")))] + [(set (match_operand:VI124_256_48_512 0 "register_operand" "=v") + (maxmin:VI124_256_48_512 + (match_operand:VI124_256_48_512 1 "nonimmediate_operand" "%v") + (match_operand:VI124_256_48_512 2 "nonimmediate_operand" "vm")))] "TARGET_AVX2 && ix86_binary_operator_ok (, mode, operands)" "vp\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseiadd") (set_attr "prefix_extra" "1") - (set_attr "prefix" "vex") + (set_attr "prefix" "maybe_evex") (set_attr "mode" "OI")]) (define_expand "3" @@ -6777,8 +6784,8 @@ (define_expand "vec_pack_trunc_" [(match_operand: 0 "register_operand") - (match_operand:VI248_AVX2 1 "register_operand") - (match_operand:VI248_AVX2 2 "register_operand")] + (match_operand:VI248_AVX2_8_AVX512F 1 "register_operand") + (match_operand:VI248_AVX2_8_AVX512F 2 "register_operand")] "TARGET_SSE2" { rtx op1 = gen_lowpart (mode, operands[1]); -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [9/n] Add AVX-512 patterns: VI124_AVX2, VI8F iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 9th subpatch. It extends VI124_AVX2_48 and VI8F iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 38 +++--- 1 file changed, 23 insertions(+), 15 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 49124ba..dd310b5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -236,6 +236,12 @@ [(V16HI "TARGET_AVX2") V8HI (V8SI "TARGET_AVX2") V4SI]) +(define_mode_iterator VI124_AVX2_48_AVX512F + [(V32QI "TARGET_AVX2") V16QI + (V16HI "TARGET_AVX2") V8HI + (V16SI "TARGET_AVX512F") (V8SI "TARGET_AVX2") V4SI + (V8DI "TARGET_AVX512F")]) + (define_mode_iterator VI124_AVX512F [(V32QI "TARGET_AVX2") V16QI (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI @@ -344,6 +350,8 @@ (define_mode_iterator VI8F_128 [V2DI V2DF]) (define_mode_iterator VI4F_256 [V8SI V8SF]) (define_mode_iterator VI8F_256 [V4DI V4DF]) +(define_mode_iterator VI8F_256_512 + [V4DI V4DF (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) ;; Mapping from float mode to required SSE level (define_mode_attr sse @@ -8627,9 +8635,9 @@ (set_attr "mode" "DI")]) (define_insn "abs2" - [(set (match_operand:VI124_AVX2 0 "register_operand" "=v") - (abs:VI124_AVX2 - (match_operand:VI124_AVX2 1 "nonimmediate_operand" "vm")))] + [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v") + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))] "TARGET_SSSE3" "%vpabs\t{%1, %0|%0, %1}" [(set_attr "type" "sselog1") @@ -10755,25 +10763,25 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_expand "avx2_perm" - [(match_operand:VI8F_256 0 "register_operand") - (match_operand:VI8F_256 1 "nonimmediate_operand") +(define_expand "_perm" + [(match_operand:VI8F_256_512 0 "register_operand") + (match_operand:VI8F_256_512 1 "nonimmediate_operand") (match_operand:SI 2 "const_0_to_255_operand")] "TARGET_AVX2" { int mask = INTVAL (operands[2]); - emit_insn (gen_avx2_perm_1 (operands[0], operands[1], - GEN_INT ((mask >> 0) & 3), - GEN_INT ((mask >> 2) & 3), - GEN_INT ((mask >> 4) & 3), - GEN_INT ((mask >> 6) & 3))); + emit_insn (gen__perm_1 (operands[0], operands[1], + GEN_INT ((mask >> 0) & 3), + GEN_INT ((mask >> 2) & 3), + GEN_INT ((mask >> 4) & 3), + GEN_INT ((mask >> 6) & 3))); DONE; }) -(define_insn "avx2_perm_1" - [(set (match_operand:VI8F_256 0 "register_operand" "=v") - (vec_select:VI8F_256 - (match_operand:VI8F_256 1 "nonimmediate_operand" "vm") +(define_insn "_perm_1" + [(set (match_operand:VI8F_256_512 0 "register_operand" "=v") + (vec_select:VI8F_256_512 + (match_operand:VI8F_256_512 1 "nonimmediate_operand" "vm") (parallel [(match_operand 2 "const_0_to_3_operand") (match_operand 3 "const_0_to_3_operand") (match_operand 4 "const_0_to_3_operand") -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [11/n] Add AVX-512 patterns: FMA.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 11th subpatch. It introduces AVX-512 FMA instructions. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/i386.c | 2 +- gcc/config/i386/sse.md | 60 -- 2 files changed, 39 insertions(+), 23 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f10113f..5908383 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -34785,7 +34785,7 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_i, int opno, int *total, rtx sub; gcc_assert (FLOAT_MODE_P (mode)); -gcc_assert (TARGET_FMA || TARGET_FMA4); +gcc_assert (TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F); /* ??? SSE scalar/vector cost should be used here. */ /* ??? Bald assumption that fma has the same cost as fmul. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a380690..6adcdd3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2254,9 +2254,18 @@ ; ;; The standard names for scalar FMA are only available with SSE math enabled. -(define_mode_iterator FMAMODEM [(SF "TARGET_SSE_MATH") - (DF "TARGET_SSE_MATH") - V4SF V2DF V8SF V4DF]) +;; CPUID bit AVX512F enables evex encoded scalar and 512-bit fma. It doesn't +;; care about FMA bit, so we enable fma for TARGET_AVX512F even when TARGET_FMA +;; and TARGET_FMA4 are both false. +(define_mode_iterator FMAMODEM + [(SF "TARGET_SSE_MATH && (TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)") + (DF "TARGET_SSE_MATH && (TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F)") + (V4SF "TARGET_FMA || TARGET_FMA4") + (V2DF "TARGET_FMA || TARGET_FMA4") + (V8SF "TARGET_FMA || TARGET_FMA4") + (V4DF "TARGET_FMA || TARGET_FMA4") + (V16SF "TARGET_AVX512F") + (V8DF "TARGET_AVX512F")]) (define_expand "fma4" [(set (match_operand:FMAMODEM 0 "register_operand") @@ -2264,7 +2273,7 @@ (match_operand:FMAMODEM 1 "nonimmediate_operand") (match_operand:FMAMODEM 2 "nonimmediate_operand") (match_operand:FMAMODEM 3 "nonimmediate_operand")))] - "TARGET_FMA || TARGET_FMA4") + "") (define_expand "fms4" [(set (match_operand:FMAMODEM 0 "register_operand") @@ -2272,7 +2281,7 @@ (match_operand:FMAMODEM 1 "nonimmediate_operand") (match_operand:FMAMODEM 2 "nonimmediate_operand") (neg:FMAMODEM (match_operand:FMAMODEM 3 "nonimmediate_operand"] - "TARGET_FMA || TARGET_FMA4") + "") (define_expand "fnma4" [(set (match_operand:FMAMODEM 0 "register_operand") @@ -2280,7 +2289,7 @@ (neg:FMAMODEM (match_operand:FMAMODEM 1 "nonimmediate_operand")) (match_operand:FMAMODEM 2 "nonimmediate_operand") (match_operand:FMAMODEM 3 "nonimmediate_operand")))] - "TARGET_FMA || TARGET_FMA4") + "") (define_expand "fnms4" [(set (match_operand:FMAMODEM 0 "register_operand") @@ -2288,10 +2297,17 @@ (neg:FMAMODEM (match_operand:FMAMODEM 1 "nonimmediate_operand")) (match_operand:FMAMODEM 2 "nonimmediate_operand") (neg:FMAMODEM (match_operand:FMAMODEM 3 "nonimmediate_operand"] - "TARGET_FMA || TARGET_FMA4") + "") ;; The builtins for intrinsics are not constrained by SSE math enabled. -(define_mode_iterator FMAMODE [SF DF V4SF V2DF V8SF V4DF]) +(define_mode_iterator FMAMODE [(SF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") + (DF "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") + (V4SF "TARGET_FMA || TARGET_FMA4") + (V2DF "TARGET_FMA || TARGET_FMA4") + (V8SF "TARGET_FMA || TARGET_FMA4") + (V4DF "TARGET_FMA || TARGET_FMA4") + (V16SF "TARGET_AVX512F") + (V8DF "TARGET_AVX512F")]) (define_expand "fma4i_fmadd_" [(set (match_operand:FMAMODE 0 "register_operand") @@ -2299,7 +2315,7 @@ (match_operand:FMAMODE 1 "nonimmediate_operand") (match_operand:FMAMODE 2 "nonimmediate_operand") (match_operand:FMAMODE 3 "nonimmediate_operand")))] - "TARGET_FMA || TARGET_FMA4") + "") (define_insn "*fma_fmadd_" [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x") @@ -2307,7 +2323,7 @@ (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0, v, x,x") (match_operand:FMAMODE 2 "nonimmediate_operand" "vm,
Re: [PATCH i386 3/8] [AVX512] [13/n] Add AVX-512 patterns: VI4_AVX iterator.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 13th subpatch. It introduces VI4_AVX iterator. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/i386.c | 4 ++-- gcc/config/i386/sse.md | 27 +-- 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 5908383..febceca 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -27751,7 +27751,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvtsd2siq, "__builtin_ia32_cvtsd2si64", IX86_BUILTIN_CVTSD2SI64, UNKNOWN, (int) INT64_FTYPE_V2DF }, { OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_64BIT, CODE_FOR_sse2_cvttsd2siq, "__builtin_ia32_cvttsd2si64", IX86_BUILTIN_CVTTSD2SI64, UNKNOWN, (int) INT64_FTYPE_V2DF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtps2dq, "__builtin_ia32_cvtps2dq", IX86_BUILTIN_CVTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_fix_notruncv4sfv4si, "__builtin_ia32_cvtps2dq", IX86_BUILTIN_CVTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtps2pd, "__builtin_ia32_cvtps2pd", IX86_BUILTIN_CVTPS2PD, UNKNOWN, (int) V2DF_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_fix_truncv4sfv4si2, "__builtin_ia32_cvttps2dq", IX86_BUILTIN_CVTTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, @@ -28099,7 +28099,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX, CODE_FOR_floatv4siv4df2, "__builtin_ia32_cvtdq2pd256", IX86_BUILTIN_CVTDQ2PD256, UNKNOWN, (int) V4DF_FTYPE_V4SI }, { OPTION_MASK_ISA_AVX, CODE_FOR_floatv8siv8sf2, "__builtin_ia32_cvtdq2ps256", IX86_BUILTIN_CVTDQ2PS256, UNKNOWN, (int) V8SF_FTYPE_V8SI }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_cvtpd2ps256, "__builtin_ia32_cvtpd2ps256", IX86_BUILTIN_CVTPD2PS256, UNKNOWN, (int) V4SF_FTYPE_V4DF }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_cvtps2dq256, "__builtin_ia32_cvtps2dq256", IX86_BUILTIN_CVTPS2DQ256, UNKNOWN, (int) V8SI_FTYPE_V8SF }, + { OPTION_MASK_ISA_AVX, CODE_FOR_avx_fix_notruncv8sfv8si, "__builtin_ia32_cvtps2dq256", IX86_BUILTIN_CVTPS2DQ256, UNKNOWN, (int) V8SI_FTYPE_V8SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_cvtps2pd256, "__builtin_ia32_cvtps2pd256", IX86_BUILTIN_CVTPS2PD256, UNKNOWN, (int) V4DF_FTYPE_V4SF }, { OPTION_MASK_ISA_AVX, CODE_FOR_fix_truncv4dfv4si2, "__builtin_ia32_cvttpd2dq256", IX86_BUILTIN_CVTTPD2DQ256, UNKNOWN, (int) V4SI_FTYPE_V4DF }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_cvtpd2dq256, "__builtin_ia32_cvtpd2dq256", IX86_BUILTIN_CVTPD2DQ256, UNKNOWN, (int) V4SI_FTYPE_V4DF }, diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 0ba1670..40030cf 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -207,6 +207,9 @@ (define_mode_iterator VI2_AVX512F [(V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX2") V8HI]) +(define_mode_iterator VI4_AVX + [(V8SI "TARGET_AVX") V4SI]) + (define_mode_iterator VI4_AVX2 [(V8SI "TARGET_AVX2") V4SI]) @@ -2823,20 +2826,16 @@ DONE; }) -(define_insn "avx_cvtps2dq256" - [(set (match_operand:V8SI 0 "register_operand" "=x") - (unspec:V8SI [(match_operand:V8SF 1 "nonimmediate_operand" "xm")] -UNSPEC_FIX_NOTRUNC))] - "TARGET_AVX" - "vcvtps2dq\t{%1, %0|%0, %1}" - [(set_attr "type" "ssecvt") - (set_attr "prefix" "vex") - (set_attr "mode" "OI")]) -(define_insn "sse2_cvtps2dq" - [(set (match_operand:V4SI 0 "register_operand" "=x") - (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")] -UNSPEC_FIX_NOTRUNC))] +;; For _fix_notrunc insn pattern +(define_mode_attr sf2simodelower + [(V16SI "v16sf") (V8SI "v8sf") (V4SI "v4sf")]) + +(define_insn "_fix_notrunc" + [(set (match_operand:VI4_AVX 0 "register_operand" "=v") + (unspec:VI4_AVX + [(match_operand: 1 "nonimmediate_operand" "vm")] + UNSPEC_FIX_NOTRUNC))] "TARGET_SSE2" "%vcvtps2dq\t{%1, %0|%0, %1}" [(set_attr "type" "ssecvt") @@ -2846,7 +2845,7 @@ (const_string "*") (const_string "1"))) (set_attr "prefix" "maybe_vex") - (set_attr "mode" "TI")]) + (set_attr "mode" "")]) (define_insn "fix_truncv16sfv16si2" [(set (match_operand:V16SI 0 "register_operand" "=v") -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [12/n] Add AVX-512 patterns: V_512 and VI_512 iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 12th subpatch. It introduces VF_512 and VI_512 iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 59 +- 1 file changed, 58 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6adcdd3..0ba1670 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -122,6 +122,9 @@ (define_mode_iterator V_256 [V32QI V16HI V8SI V4DI V8SF V4DF]) +;; All 512bit vector modes +(define_mode_iterator V_512 [V64QI V32HI V16SI V8DI V16SF V8DF]) + ;; All 256bit and 512bit vector modes (define_mode_iterator V_256_512 [V32QI V16HI V8SI V4DI V8SF V4DF @@ -337,7 +340,10 @@ ;; All 256bit vector integer modes (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) -;; Random 128bit vector integer mode combinations +;; All 512bit vector integer modes +(define_mode_iterator VI_512 [V64QI V32HI V16SI V8DI]) + +;; Various 128bit vector integer mode combinations (define_mode_iterator VI12_128 [V16QI V8HI]) (define_mode_iterator VI14_128 [V16QI V4SI]) (define_mode_iterator VI124_128 [V16QI V8HI V4SI]) @@ -1853,6 +1859,23 @@ (const_string "0"))) (set_attr "mode" "")]) +(define_expand "vcond" + [(set (match_operand:V_512 0 "register_operand") + (if_then_else:V_512 + (match_operator 3 "" + [(match_operand:VF_512 4 "nonimmediate_operand") +(match_operand:VF_512 5 "nonimmediate_operand")]) + (match_operand:V_512 1 "general_operand") + (match_operand:V_512 2 "general_operand")))] + "TARGET_AVX512F + && (GET_MODE_NUNITS (mode) + == GET_MODE_NUNITS (mode))" +{ + bool ok = ix86_expand_fp_vcond (operands); + gcc_assert (ok); + DONE; +}) + (define_expand "vcond" [(set (match_operand:V_256 0 "register_operand") (if_then_else:V_256 @@ -6457,6 +6480,23 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) +(define_expand "vcond" + [(set (match_operand:V_512 0 "register_operand") + (if_then_else:V_512 + (match_operator 3 "" + [(match_operand:VI_512 4 "nonimmediate_operand") +(match_operand:VI_512 5 "general_operand")]) + (match_operand:V_512 1) + (match_operand:V_512 2)))] + "TARGET_AVX512F + && (GET_MODE_NUNITS (mode) + == GET_MODE_NUNITS (mode))" +{ + bool ok = ix86_expand_int_vcond (operands); + gcc_assert (ok); + DONE; +}) + (define_expand "vcond" [(set (match_operand:V_256 0 "register_operand") (if_then_else:V_256 @@ -6506,6 +6546,23 @@ DONE; }) +(define_expand "vcondu" + [(set (match_operand:V_512 0 "register_operand") + (if_then_else:V_512 + (match_operator 3 "" + [(match_operand:VI_512 4 "nonimmediate_operand") +(match_operand:VI_512 5 "nonimmediate_operand")]) + (match_operand:V_512 1 "general_operand") + (match_operand:V_512 2 "general_operand")))] + "TARGET_AVX512F + && (GET_MODE_NUNITS (mode) + == GET_MODE_NUNITS (mode))" +{ + bool ok = ix86_expand_int_vcond (operands); + gcc_assert (ok); + DONE; +}) + (define_expand "vcondu" [(set (match_operand:V_256 0 "register_operand") (if_then_else:V_256 -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [15/n] Add AVX-512 patterns: VI48F_512 iterator.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 15th subpatch. It introduces VI48F_512 iterator. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 332 ++--- 1 file changed, 316 insertions(+), 16 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index bfaa3a1..2364ccc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -83,6 +83,11 @@ UNSPEC_VPERMTI UNSPEC_GATHER UNSPEC_VSIBADDR + + ;; For AVX512F support + UNSPEC_VPERMI2 + UNSPEC_VPERMT2 + UNSPEC_SCATTER ]) (define_c_enum "unspecv" [ @@ -371,6 +376,7 @@ [V8SI V8SF (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) +(define_mode_iterator VI48F_512 [V16SI V16SF V8DI V8DF]) ;; Mapping from float mode to required SSE level (define_mode_attr sse @@ -409,6 +415,15 @@ (V4SF "V4SF") (V2DF "V2DF") (TI "TI")]) +;; Mapping of vector modes to corresponding mask size +(define_mode_attr avx512fmaskmode + [(V16QI "HI") + (V16HI "HI") (V8HI "QI") + (V16SI "HI") (V8SI "QI") (V4SI "QI") + (V8DI "QI") (V4DI "QI") (V2DI "QI") + (V16SF "HI") (V8SF "QI") (V4SF "QI") + (V8DF "QI") (V4DF "QI") (V2DF "QI")]) + ;; Mapping of vector float modes to an integer mode of the same size (define_mode_attr sseintvecmode [(V16SF "V16SI") (V8DF "V8DI") @@ -501,10 +516,12 @@ ;; SSE prefix for integer vector modes (define_mode_attr sseintprefix - [(V2DI "p") (V2DF "") - (V4DI "p") (V4DF "") - (V4SI "p") (V4SF "") - (V8SI "p") (V8SF "")]) + [(V2DI "p") (V2DF "") + (V4DI "p") (V4DF "") + (V8DI "p") (V8DF "") + (V4SI "p") (V4SF "") + (V8SI "p") (V8SF "") + (V16SI "p") (V16SF "")]) ;; SSE scalar suffix for vector modes (define_mode_attr ssescalarmodesuffix @@ -549,6 +566,10 @@ (define_mode_attr blendbits [(V8SF "255") (V4SF "15") (V4DF "15") (V2DF "3")]) +;; Mapping suffixes for broadcast +(define_mode_attr bcstscalarsuff + [(V16SI "d") (V16SF "ss") (V8DI "q") (V8DF "sd")]) + ;; Patterns whose name begins with "sse{,2,3}_" are invoked by intrinsics. ; @@ -688,6 +709,18 @@ ] (const_string "")))]) +(define_insn "avx512f_blendm" + [(set (match_operand:VI48F_512 0 "register_operand" "=v") + (vec_merge:VI48F_512 + (match_operand:VI48F_512 2 "nonimmediate_operand" "vm") + (match_operand:VI48F_512 1 "register_operand" "v") + (match_operand: 3 "register_operand" "k")))] + "TARGET_AVX512F" + "vblendm\t{%2, %1, %0%{%3%}|%0%{%3%}, %1, %2}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "sse2_movq128" [(set (match_operand:V2DI 0 "register_operand" "=x") (vec_concat:V2DI @@ -1826,6 +1859,24 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) +(define_mode_attr cmp_imm_predicate + [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand") + (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")]) + +(define_insn "avx512f_cmp3" + [(set (match_operand: 0 "register_operand" "=k") + (unspec: + [(match_operand:VI48F_512 1 "register_operand" "v") + (match_operand:VI48F_512 2 "nonimmediate_operand" "vm") + (match_operand:SI 3 "" "n")] + UNSPEC_PCMP))] + "TARGET_AVX512F" + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ssecmp") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_comi" [(set (reg:CCFP FLAGS_REG) (compare:CCFP @@ -10927,6 +10978,28 @@ (set_attr "isa" "*,avx2,noavx2") (set_attr "mode" "V8SF")]) +(define_insn "avx512f_vec_dup" + [(set (match_operand:VI48F_512 0 "register_operand" "=v") + (vec_duplicate:VI48F_512 + (vec_select: + (match_operand: 1 "nonimmediate_operand" "vm") + (parallel [(const_int 0)]] + "TARGET_AVX512F" + "vbroadcast\t{%1, %0|%0, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_vec_dup_mem" + [(set (match_operand:VI48F_512 0 "register_operand" "=x") + (vec_duplicate:VI48F_512 + (match_operand: 1 "nonimmediate_operand" "xm")))] + "TARGET_AVX512F" + "vbroadcast\t{%1, %0|%0, %1}" + [(set_attr "type" "ssemov") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "avx2_vbroadcasti128_" [(set (match_operand:VI_2
Re: [PATCH i386 3/8] [AVX512] [14/n] Add AVX-512 patterns: VI48F_256_512 iterator.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 14th subpatch. It introduces VI48F_256_512 iterator. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 40030cf..bfaa3a1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -367,6 +367,10 @@ (define_mode_iterator VI8F_256 [V4DI V4DF]) (define_mode_iterator VI8F_256_512 [V4DI V4DF (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) +(define_mode_iterator VI48F_256_512 + [V8SI V8SF + (V16SI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") + (V8DI "TARGET_AVX512F") (V8DF "TARGET_AVX512F")]) ;; Mapping from float mode to required SSE level (define_mode_attr sse @@ -10830,17 +10834,17 @@ (set_attr "prefix" "vex") (set_attr "mode" "")]) -(define_insn "avx2_permvar" - [(set (match_operand:VI4F_256 0 "register_operand" "=v") - (unspec:VI4F_256 - [(match_operand:VI4F_256 1 "nonimmediate_operand" "vm") - (match_operand:V8SI 2 "register_operand" "v")] +(define_insn "_permvar" + [(set (match_operand:VI48F_256_512 0 "register_operand" "=v") + (unspec:VI48F_256_512 + [(match_operand:VI48F_256_512 1 "nonimmediate_operand" "vm") + (match_operand: 2 "register_operand" "v")] UNSPEC_VPERMVAR))] "TARGET_AVX2" "vperm\t{%1, %2, %0|%0, %2, %1}" [(set_attr "type" "sselog") (set_attr "prefix" "vex") - (set_attr "mode" "OI")]) + (set_attr "mode" "")]) (define_expand "_perm" [(match_operand:VI8F_256_512 0 "register_operand") -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [17/n] Add AVX-512 patterns: V8FI and V16FI iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 17th subpatch. It introduces V8FI and V16FI iterators. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/predicates.md | 10 ++ gcc/config/i386/sse.md| 367 +- 2 files changed, 376 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index eff82eb..e1670f3 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -757,11 +757,21 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 8, 11)"))) +;; Match 8 to 15. +(define_predicate "const_8_to_15_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 8, 15)"))) + ;; Match 12 to 15. (define_predicate "const_12_to_15_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 12, 15)"))) +;; Match 16 to 31. +(define_predicate "const_16_to_31_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 16, 31)"))) + ;; True if this is a constant appropriate for an increment or decrement. (define_predicate "incdec_operand" (match_code "const_int") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8221d61..2b27649f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -243,6 +243,14 @@ (define_mode_iterator VI8_AVX2_AVX512F [(V8DI "TARGET_AVX512F") (V4DI "TARGET_AVX2") V2DI]) +;; All V8D* modes +(define_mode_iterator V8FI + [V8DF V8DI]) + +;; All V16S* modes +(define_mode_iterator V16FI + [V16SF V16SI]) + ;; ??? We should probably use TImode instead. (define_mode_iterator VIMAX_AVX2 [(V2TI "TARGET_AVX2") V1TI]) @@ -347,8 +355,12 @@ (V32QI "i") (V16HI "u") (V16QI "i") (V8HI "i") (V64QI "i") (V1TI "i") (V2TI "i")]) +(define_mode_attr ssequartermode + [(V16SF "V4SF") (V8DF "V2DF") (V16SI "V4SI") (V8DI "V2DI")]) + (define_mode_attr ssedoublemode - [(V16HI "V16SI") (V8HI "V8SI") (V4HI "V4SI") + [(V16SF "V32SF") (V16SI "V32SI") (V8DI "V16DI") (V8DF "V16DF") + (V16HI "V16SI") (V8HI "V8SI") (V4HI "V4SI") (V32QI "V32HI") (V16QI "V16HI")]) (define_mode_attr ssebytemode @@ -1697,6 +1709,15 @@ (set_attr "prefix_rep" "1,*") (set_attr "mode" "V4SF")]) +(define_expand "reduc_splus_v8df" + [(match_operand:V8DF 0 "register_operand") + (match_operand:V8DF 1 "register_operand")] + "TARGET_AVX512F" +{ + ix86_expand_reduc (gen_addv8df3, operands[0], operands[1]); + DONE; +}) + (define_expand "reduc_splus_v4df" [(match_operand:V4DF 0 "register_operand") (match_operand:V4DF 1 "register_operand")] @@ -1719,6 +1740,15 @@ DONE; }) +(define_expand "reduc_splus_v16sf" + [(match_operand:V16SF 0 "register_operand") + (match_operand:V16SF 1 "register_operand")] + "TARGET_AVX512F" +{ + ix86_expand_reduc (gen_addv16sf3, operands[0], operands[1]); + DONE; +}) + (define_expand "reduc_splus_v8sf" [(match_operand:V8SF 0 "register_operand") (match_operand:V8SF 1 "register_operand")] @@ -4748,6 +4778,86 @@ operands[1] = adjust_address (operands[1], SFmode, INTVAL (operands[2]) * 4); }) +(define_insn "avx512f_vextract32x4_1" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") + (vec_select: + (match_operand:V16FI 1 "register_operand" "v") + (parallel [(match_operand 2 "const_0_to_15_operand") +(match_operand 3 "const_0_to_15_operand") +(match_operand 4 "const_0_to_15_operand") +(match_operand 5 "const_0_to_15_operand")])))] + "TARGET_AVX512F && (INTVAL (operands[2]) = INTVAL (operands[3]) - 1) + && (INTVAL (operands[3]) = INTVAL (operands[4]) - 1) + && (INTVAL (operands[4]) = INTVAL (operands[5]) - 1)" +{ + operands[2] = GEN_INT ((INTVAL (operands[2])) >> 2); + return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; +} + [(set_attr "type" "sselog") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set (attr "memory") + (if_then_else (match_test "MEM_P (operands[0])") + (const_string "store") + (const_string "none"))) + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_split + [(set (match_operand: 0 "nonimmediate_operand") + (vec_select: + (match_operand:V8FI 1 "nonimmediate_operand") + (parallel [(const_int 0) (const_int 1) +(const_int 2) (const_int 3)])))] + "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1])) + && reload_completed" + [(const_int 0)] +{ + rtx op1 = operands[1]; + if (REG_P (op1)) +op1 = gen
Re: [PATCH i386 3/8] [AVX512] [18/n] Add AVX-512 patterns: various RCPs and SQRTs.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 18th subpatch. It introduces various new insns. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/sse.md | 220 - 1 file changed, 216 insertions(+), 4 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2b27649f..3ab35a7 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -91,7 +91,13 @@ UNSPEC_TESTM UNSPEC_TESTNM UNSPEC_SCATTER + UNSPEC_RCP14 + UNSPEC_RSQRT14 + UNSPEC_FIXUPIMM + UNSPEC_SCALEF UNSPEC_VTERNLOG + UNSPEC_GETEXP + UNSPEC_GETMANT UNSPEC_ALIGN UNSPEC_CONFLICT UNSPEC_MASKED_EQ @@ -100,6 +106,11 @@ ;; For AVX512PF support UNSPEC_GATHER_PREFETCH UNSPEC_SCATTER_PREFETCH + + ;; For AVX512ER support + UNSPEC_EXP2 + UNSPEC_RCP28 + UNSPEC_RSQRT28 ]) (define_c_enum "unspecv" [ @@ -363,6 +374,9 @@ (V16HI "V16SI") (V8HI "V8SI") (V4HI "V4SI") (V32QI "V32HI") (V16QI "V16HI")]) +(define_mode_attr ssefixupmode + [(V16SF "V16SI") (V4SF "V4SI") (V8DF "V8DI") (V2DF "V2DI")]) + (define_mode_attr ssebytemode [(V4DI "V32QI") (V2DI "V16QI")]) @@ -1254,6 +1268,32 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "SF")]) +(define_insn "rcp14" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (unspec:VF_512 + [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] + UNSPEC_RCP14))] + "TARGET_AVX512F" + "vrcp14\t{%1, %0|%0, %1}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "srcp14" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (unspec:VF_128 + [(match_operand:VF_128 1 "register_operand" "v") +(match_operand:VF_128 2 "nonimmediate_operand" "vm")] + UNSPEC_RCP14) + (match_dup 1) + (const_int 1)))] + "TARGET_AVX512F" + "vrcp14\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_expand "sqrt2" [(set (match_operand:VF2 0 "register_operand") (sqrt:VF2 (match_operand:VF2 1 "nonimmediate_operand")))] @@ -1324,6 +1364,32 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) +(define_insn "rsqrt14" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (unspec:VF_512 + [(match_operand:VF_512 1 "nonimmediate_operand" "vm")] + UNSPEC_RSQRT14))] + "TARGET_AVX512F" + "vrsqrt14\t{%1, %0|%0, %1}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "rsqrt14" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (unspec:VF_128 + [(match_operand:VF_128 1 "register_operand" "v") +(match_operand:VF_128 2 "nonimmediate_operand" "vm")] + UNSPEC_RSQRT14) + (match_dup 1) + (const_int 1)))] + "TARGET_AVX512F" + "vrsqrt14\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "sse_vmrsqrtv4sf2" [(set (match_operand:V4SF 0 "register_operand" "=x,x") (vec_merge:V4SF @@ -5301,6 +5367,29 @@ operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8); }) +(define_insn "avx512f_vmscalef" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (unspec:VF_128 [(match_operand:VF_128 1 "register_operand" "v") + (match_operand:VF_128 2 "nonimmediate_operand" "vm")] +UNSPEC_SCALEF) + (match_dup 1) + (const_int 1)))] + "TARGET_AVX512F" + "%vscalef\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_scalef" + [(set (match_operand:VF_512 0 "register_operand" "=v") + (unspec:VF_512 [(match_operand:VF_512 1 "register_operand" "v") + (match_operand:VF_512 2 "nonimmediate_operand" "vm")] + UNSPEC_SCALEF))] + "TARGET_AVX512F" + "%vscalef\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "avx512f_vternlog" [(set (match_operand:VI48_512 0 "register_operand" "=v") (unspec:VI48_512 @@ -5315,6 +5404,28 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_getexp" + [(set (match_operand:VF_512 0 "register_operand" "=v") +(unspec:VF_512 [(match_operand:VF_512 1 "nonimmediate_operand" "vm")
Re: [PATCH i386 3/8] [AVX512] [20/n] Add AVX-512 patterns: Misc.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 20th subpatch. It introduces last insns of AVX-512F. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. This patch finalize 3/8 series. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/i386.md | 16 +++- gcc/config/i386/sse.md | 34 ++ 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e7e9f2d..91be1ce 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -17531,7 +17531,7 @@ [(prefetch (match_operand 0 "address_operand") (match_operand:SI 1 "const_int_operand") (match_operand:SI 2 "const_int_operand"))] - "TARGET_PREFETCH_SSE || TARGET_PRFCHW" + "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_AVX512PF" { bool write = INTVAL (operands[1]) != 0; int locality = INTVAL (operands[2]); @@ -17544,6 +17544,8 @@ of locality. */ if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE)) operands[2] = GEN_INT (3); + else if (TARGET_AVX512PF && (write || !TARGET_PREFETCH_SSE)) +operands[2] = GEN_INT (1); else operands[1] = const0_rtx; }) @@ -17585,6 +17587,18 @@ (symbol_ref "memory_address_length (operands[0], false)")) (set_attr "memory" "none")]) +(define_insn "*prefetch_avx512pf_" + [(prefetch (match_operand:P 0 "address_operand" "p") +(const_int 1) +(const_int 1))] + "TARGET_AVX512PF" + "prefetchwt1\t%a0"; + [(set_attr "type" "sse") + (set_attr "prefix" "evex") + (set (attr "length_address") + (symbol_ref "memory_address_length (operands[0], false)")) + (set_attr "memory" "none")]) + (define_expand "stack_protect_set" [(match_operand 0 "memory_operand") (match_operand 1 "memory_operand")] diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f7e9fd5..939cc33 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2013,6 +2013,34 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_vmcmp3" + [(set (match_operand: 0 "register_operand" "=k") + (and: + (unspec: + [(match_operand:VF_128 1 "register_operand" "v") +(match_operand:VF_128 2 "nonimmediate_operand" "vm") +(match_operand:SI 3 "const_0_to_31_operand" "n")] + UNSPEC_PCMP) + (const_int 1)))] + "TARGET_AVX512F" + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ssecmp") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_maskcmp3" + [(set (match_operand: 0 "register_operand" "=k") + (match_operator: 3 "sse_comparison_operator" + [(match_operand:VF 1 "register_operand" "v") + (match_operand:VF 2 "nonimmediate_operand" "vm")]))] + "TARGET_SSE" + "vcmp%D3\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "ssecmp") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_comi" [(set (reg:CCFP FLAGS_REG) (compare:CCFP @@ -12154,6 +12182,12 @@ } }) +(define_expand "vashrv16si3" + [(set (match_operand:V16SI 0 "register_operand") + (ashiftrt:V16SI (match_operand:V16SI 1 "register_operand") + (match_operand:V16SI 2 "nonimmediate_operand")))] + "TARGET_AVX512F") + (define_expand "vashrv8si3" [(set (match_operand:V8SI 0 "register_operand") (ashiftrt:V8SI (match_operand:V8SI 1 "register_operand") -- 1.7.11.7
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 19th subpatch. It extends extract and convert insn patterns. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/i386.md | 5 + gcc/config/i386/predicates.md | 40 ++ gcc/config/i386/sse.md| 938 +- 3 files changed, 977 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 10ca6cb..e7e9f2d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -831,6 +831,11 @@ (define_code_attr s [(sign_extend "s") (zero_extend "u")]) (define_code_attr u_bool [(sign_extend "false") (zero_extend "true")]) +;; Used in signed and unsigned truncations. +(define_code_iterator any_truncate [ss_truncate truncate us_truncate]) +;; Instruction suffix for truncations. +(define_code_attr trunsuffix [(ss_truncate "s") (truncate "") (us_truncate "us")]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index e1670f3..261335d 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -752,6 +752,11 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 6, 7)"))) +;; Match 8 to 9. +(define_predicate "const_8_to_9_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 8, 9)"))) + ;; Match 8 to 11. (define_predicate "const_8_to_11_operand" (and (match_code "const_int") @@ -762,16 +767,51 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 8, 15)"))) +;; Match 10 to 11. +(define_predicate "const_10_to_11_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 10, 11)"))) + +;; Match 12 to 13. +(define_predicate "const_12_to_13_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 12, 13)"))) + ;; Match 12 to 15. (define_predicate "const_12_to_15_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 12, 15)"))) +;; Match 14 to 15. +(define_predicate "const_14_to_15_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 14, 15)"))) + +;; Match 16 to 19. +(define_predicate "const_16_to_19_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 16, 19)"))) + ;; Match 16 to 31. (define_predicate "const_16_to_31_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 16, 31)"))) +;; Match 20 to 23. +(define_predicate "const_20_to_23_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 20, 23)"))) + +;; Match 24 to 27. +(define_predicate "const_24_to_27_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 24, 27)"))) + +;; Match 28 to 31. +(define_predicate "const_28_to_31_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 28, 31)"))) + ;; True if this is a constant appropriate for an increment or decrement. (define_predicate "incdec_operand" (match_code "const_int") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 3ab35a7..f7e9fd5 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -87,6 +87,7 @@ ;; For AVX512F support UNSPEC_VPERMI2 UNSPEC_VPERMT2 + UNSPEC_UNSIGNED_FIX_NOTRUNC UNSPEC_UNSIGNED_PCMP UNSPEC_TESTM UNSPEC_TESTNM @@ -2997,6 +2998,34 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) +(define_insn "cvtusi232" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (unsigned_float: + (match_operand:SI 2 "nonimmediate_operand" "rm"))) + (match_operand:VF_128 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512F" + "vcvtusi2\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseicvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "cvtusi264" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (unsigned_float: + (match_operand:DI 2 "nonimmediate_operand" "rm"))) + (match_operand:VF_128 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512F && TARGET_64BIT" + "vcvtusi2\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseicvt") + (set_attr "prefix" "evex") + (se
Re: [PATCH i386 3/8] [AVX512] [16/n] Add AVX-512 patterns: VI48_512 and VI4F_128 iterators.
Hello, > This patch is still far too large. > > I think you should split it up based on every single mode iterator that > you need to add or change. Here's 1st subpatch. It extends VI4F_128 and introduces VI48_512 iterator. Is it Ok? Testing: 1. Bootstrap pass. 2. make check shows no regressions. 3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f option. 4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f option. -- Thanks, K PS. If it is Ok - I am going to strip out ChangeLog lines from big patch. --- gcc/config/i386/predicates.md | 5 + gcc/config/i386/sse.md| 344 +- 2 files changed, 348 insertions(+), 1 deletion(-) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 18f425c..eff82eb 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1332,3 +1332,8 @@ (define_predicate "general_vector_operand" (ior (match_operand 0 "nonimmediate_operand") (match_code "const_vector"))) + +;; Return true if OP is either -1 constant or stored in register. +(define_predicate "register_or_constm1_operand" + (ior (match_operand 0 "register_operand") + (match_test "op == constm1_rtx"))) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2364ccc..8221d61 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -87,7 +87,19 @@ ;; For AVX512F support UNSPEC_VPERMI2 UNSPEC_VPERMT2 + UNSPEC_UNSIGNED_PCMP + UNSPEC_TESTM + UNSPEC_TESTNM UNSPEC_SCATTER + UNSPEC_VTERNLOG + UNSPEC_ALIGN + UNSPEC_CONFLICT + UNSPEC_MASKED_EQ + UNSPEC_MASKED_GT + + ;; For AVX512PF support + UNSPEC_GATHER_PREFETCH + UNSPEC_SCATTER_PREFETCH ]) (define_c_enum "unspecv" [ @@ -364,6 +376,7 @@ (define_mode_iterator VI124_256_48_512 [V32QI V16HI V8SI (V8DI "TARGET_AVX512F") (V16SI "TARGET_AVX512F")]) (define_mode_iterator VI48_256 [V8SI V4DI]) +(define_mode_iterator VI48_512 [V16SI V8DI]) ;; Int-float size matches (define_mode_iterator VI4F_128 [V4SI V4SF]) @@ -1741,7 +1754,9 @@ [(V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2") (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2") (V8SF "TARGET_AVX") (V4DF "TARGET_AVX") - (V4SF "TARGET_SSE")]) + (V4SF "TARGET_SSE") (V16SI "TARGET_AVX512F") + (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") + (V8DF "TARGET_AVX512F")]) (define_expand "reduc__" [(smaxmin:REDUC_SMINMAX_MODE @@ -1754,6 +1769,16 @@ }) (define_expand "reduc__" + [(umaxmin:VI48_512 + (match_operand:VI48_512 0 "register_operand") + (match_operand:VI48_512 1 "register_operand"))] + "TARGET_AVX512F" +{ + ix86_expand_reduc (gen_3, operands[0], operands[1]); + DONE; +}) + +(define_expand "reduc__" [(umaxmin:VI_256 (match_operand:VI_256 0 "register_operand") (match_operand:VI_256 1 "register_operand"))] @@ -1877,6 +1902,20 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_insn "avx512f_ucmp3" + [(set (match_operand: 0 "register_operand" "=k") + (unspec: + [(match_operand:VI48_512 1 "register_operand" "v") + (match_operand:VI48_512 2 "nonimmediate_operand" "vm") + (match_operand:SI 3 "const_0_to_7_operand" "n")] + UNSPEC_UNSIGNED_PCMP))] + "TARGET_AVX512F" + "vpcmpu\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "type" "ssecmp") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "_comi" [(set (reg:CCFP FLAGS_REG) (compare:CCFP @@ -5113,6 +5152,31 @@ operands[1] = adjust_address (operands[1], DFmode, INTVAL (operands[2]) * 8); }) +(define_insn "avx512f_vternlog" + [(set (match_operand:VI48_512 0 "register_operand" "=v") + (unspec:VI48_512 + [(match_operand:VI48_512 1 "register_operand" "0") + (match_operand:VI48_512 2 "register_operand" "v") + (match_operand:VI48_512 3 "nonimmediate_operand" "vm") + (match_operand:SI 4 "const_0_to_255_operand")] + UNSPEC_VTERNLOG))] + "TARGET_AVX512F" + "vpternlog\t{%4, %3, %2, %0|%0, %2, %3, %4}" + [(set_attr "type" "sselog") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "avx512f_align" + [(set (match_operand:VI48_512 0 "register_operand" "=v") +(unspec:VI48_512 [(match_operand:VI48_512 1 "register_operand" "v") + (match_operand:VI48_512 2 "nonimmediate_operand" "vm") + (match_operand:SI 3 "const_0_to_255_operand")] +UNSPEC_ALIGN))] + "TARGET_AVX512F" + "valign\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_insn "avx512f_rndscale" [(set (match_operand:VF_512 0 "register_operand" "=v") (unspec:VF_512 @@ -6137,6 +6201,22 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) +(define_insn "3" + [(set (match_operand:VI48_512 0 "register_operand" "=v,v") +
Re: [PATCH i386 3/8] [AVX512] [11/n] Add AVX-512 patterns: FMA.
Hello, Thanks a lot for all reviews! On 09 Oct 14:55, Richard Henderson wrote: > On 10/09/2013 03:28 AM, Kirill Yukhin wrote: > > +;; CPUID bit AVX512F enables evex encoded scalar and 512-bit fma. It > > doesn't > > +;; care about FMA bit, so we enable fma for TARGET_AVX512F even when > > TARGET_FMA > > +;; and TARGET_FMA4 are both false. > > How do you force an evex encoding of the instruction? > > Do you really mean that cpuid AVX512F, !FMA will not #OP > for a vex (but not evex) encoded version of the same insn? Your concern is correct, but I believe it relates more to Binutils, since it is GAS who cannot force EVEX encoding for such: vfnmsub132ss %xmm1,%xmm2,%xmm3 Currently, from HW point of view, there're no CPUs which feature AVX-512, but not AVX2. So, I believe we may put a `TODO` in comment, like this: +;; CPUID bit AVX512F enables evex encoded scalar and 512-bit fma. It doesn't +;; care about FMA bit, so we enable fma for TARGET_AVX512F even when TARGET_FMA +;; and TARGET_FMA4 are both false. +;; TODO: if (AVX512F && !FMA && (we don't use regnos in 16..31 range) then for +;; scalar FMA we'll got VEX encoded variant. We need somewhat improve +;; GAS to allow forcing of EVEX encoding and then force it here. Do you think it is acceptable? -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [2/n] Add AVX-512 patterns: Fix missing `v' constraint.
Hello, On 09 Oct 14:20, Richard Henderson wrote: > On 10/09/2013 03:24 AM, Kirill Yukhin wrote: > > Here's 2nd subpatch. It fixes missing `v' constraints. > > And one v constraint that shouldn't have been. Exactly! > Ok. Checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00379.html Thanks, K
Re: [PATCH i386 3/8] [AVX512] [3/n] Add AVX-512 patterns: VF1 and VI iterators.
Hello, On 09 Oct 14:25, Richard Henderson wrote: > On 10/09/2013 03:24 AM, Kirill Yukhin wrote: > > Here's 3rd subpatch. It extends VF1 and VI iterators. > > Ok. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00380.html K
Re: [PATCH i386 3/8] [AVX512] [4/n] Add AVX-512 patterns: V iterator.
Hello, On 09 Oct 14:32, Richard Henderson wrote: > On 10/09/2013 03:25 AM, Kirill Yukhin wrote: > > Here's 4th subpatch. It extends V iterator. > Ok. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00382.html K
Re: [PATCH i386 3/8] [AVX512] [5/n] Add AVX-512 patterns: Introduce `multdiv' code iterator.
Hello, On 09 Oct 14:34, Richard Henderson wrote: > On 10/09/2013 03:25 AM, Kirill Yukhin wrote: > > Here's 5th subpatch. It introduces `multdiv' code iterator. > > This is the sort of patch I like to see. It's the first one > you've sent that's done exactly one thing. Congratulations. > > Ok. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00383.html K
Re: [PATCH i386 3/8] [AVX512] [6/n] Add AVX-512 patterns: VI2 and VI124 iterators.
Hello On 09 Oct 14:35, Richard Henderson wrote: > On 10/09/2013 03:26 AM, Kirill Yukhin wrote: > > Here's 6th subpatch. It extends VI2 and VI124 iterators. > > Ok. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00384.html K
Re: [PATCH i386 3/8] [AVX512] [7/n] Add AVX-512 patterns: VI4 and VI8 iterators.
Hello, On 09 Oct 14:37, Richard Henderson wrote: > On 10/09/2013 03:26 AM, Kirill Yukhin wrote: > > Here's 7th subpatch. It extends VI4 and VI8 iterators. > > Ok. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00385.html K
Re: [PATCH i386 3/8] [AVX512] [5/n] Add AVX-512 patterns: Introduce `multdiv' code iterator.
On 11 Oct 15:43, Jakub Jelinek wrote: > On Fri, Oct 11, 2013 at 05:39:05PM +0400, Kirill Yukhin wrote: > > Thanks, checked into main trunk: > > http://gcc.gnu.org/ml/gcc-cvs/2013-10/msg00383.html > > Everybody can read gcc-cvs mailing list, it's archives or svn log > or git log, there is no need to duplicate this info to gcc-patches > mailing list. Okay, I'll stop sending these updates! Thanks, K
Re: [PATCH i386 3/8] [AVX512] [15/n] Add AVX-512 patterns: VI48F_512 iterator.
Hello, On 11 Oct 10:30, Richard Henderson wrote: > On 10/09/2013 03:29 AM, Kirill Yukhin wrote: > > +(define_insn "avx512f_vec_dup_mem" > > + [(set (match_operand:VI48F_512 0 "register_operand" "=x") > > + (vec_duplicate:VI48F_512 > > + (match_operand: 1 "nonimmediate_operand" "xm")))] > > + "TARGET_AVX512F" > > + "vbroadcast\t{%1, %0|%0, %1}" > > + [(set_attr "type" "ssemov") > > + (set_attr "prefix" "evex") > > + (set_attr "mode" "")]) > > Ought these be 'v' not 'x'? Good catch! Fixed and checked in. -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [16/n] Add AVX-512 patterns: VI48_512 and VI4F_128 iterators.
Hello, On 11 Oct 11:21, Richard Henderson wrote: > On 10/09/2013 03:30 AM, Kirill Yukhin wrote: > > +;; Return true if OP is either -1 constant or stored in register. > > +(define_predicate "register_or_constm1_operand" > > + (ior (match_operand 0 "register_operand") > > + (match_test "op == constm1_rtx"))) > > This won't do the right thing, because you're not exposing > that const_int is a valid input. You need a match_code too. Thanks, fixed and checked in. -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [18/n] Add AVX-512 patterns: various RCPs and SQRTs.
Hello, On 14 Oct 13:10, Richard Henderson wrote: > On 10/09/2013 03:31 AM, Kirill Yukhin wrote: > > +(define_mode_attr ssefixupmode > > + [(V16SF "V16SI") (V4SF "V4SI") (V8DF "V8DI") (V2DF "V2DI")]) > > + > > Oh, I forgot. How is this different from sseintvecmode? It is definetely a bug. Ok with corresponding replacement and removal of redundant mode attr? -- Thanks, K
[PATCH i386 AVX2] Remove redundant expands.
Hello, It seems that gang of AVX* patterns were copy and pasted from SSE, however as far as they are NDD, we may remove corresponding expands which sort operands. ChangeLog: * config/i386/sse.md (vec_widen_umult_even_v8si): Remove expand, make insn visible, remove redundant check. (vec_widen_smult_even_v8si): Ditto. (avx2_pmaddwd): Ditto. (avx2_eq3): Ditto. (avx512f_eq3): Ditto. Bootrstrap pass. All AVX* tests pass. Is it ok to commit to main trunk? -- Thanks, K --- gcc/ChangeLog | 9 gcc/config/i386/sse.md | 119 ++--- 2 files changed, 23 insertions(+), 105 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ebaa3e0..b25d8eb 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2013-10-16 Kirill Yukhin + + * config/i386/sse.md (vec_widen_umult_even_v8si): Remove expand, + make insn visible, remove redundant check. + (vec_widen_smult_even_v8si): Ditto. + (avx2_pmaddwd): Ditto. + (avx2_eq3): Ditto. + (avx512f_eq3): Ditto. + 2013-10-16 Yvan Roux * config/arm/arm.opt (mlra): New option. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2046dd5..57e4c2b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6067,23 +6067,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "")]) -(define_expand "vec_widen_umult_even_v8si" - [(set (match_operand:V4DI 0 "register_operand") - (mult:V4DI - (zero_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 1 "nonimmediate_operand") - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)]))) - (zero_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 2 "nonimmediate_operand") - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)])] - "TARGET_AVX2" - "ix86_fixup_binary_operands_no_copy (MULT, V8SImode, operands);") - -(define_insn "*vec_widen_umult_even_v8si" +(define_insn "vec_widen_umult_even_v8si" [(set (match_operand:V4DI 0 "register_operand" "=x") (mult:V4DI (zero_extend:V4DI @@ -6096,7 +6080,7 @@ (match_operand:V8SI 2 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - "TARGET_AVX2 && ix86_binary_operator_ok (MULT, V8SImode, operands)" + "TARGET_AVX2" "vpmuludq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sseimul") (set_attr "prefix" "vex") @@ -6137,28 +6121,12 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_expand "vec_widen_smult_even_v8si" - [(set (match_operand:V4DI 0 "register_operand") - (mult:V4DI - (sign_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 1 "nonimmediate_operand") - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)]))) - (sign_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 2 "nonimmediate_operand") - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)])] - "TARGET_AVX2" - "ix86_fixup_binary_operands_no_copy (MULT, V8SImode, operands);") - -(define_insn "*vec_widen_smult_even_v8si" +(define_insn "vec_widen_smult_even_v8si" [(set (match_operand:V4DI 0 "register_operand" "=x") (mult:V4DI (sign_extend:V4DI (vec_select:V4SI - (match_operand:V8SI 1 "nonimmediate_operand" "x") + (match_operand:V8SI 1 "nonimmediate_operand" "%x") (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))) (sign_extend:V4DI @@ -6166,7 +6134,7 @@ (match_operand:V8SI 2 "nonimmediate_operand" "xm") (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - "TARGET_AVX2 && ix86_binary_operator_ok (MULT, V8SImode, operands)" + "TARGET_AVX2" "vpmuldq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "avx") (set_attr "type" "sseimul") @@ -6210,41 +6178,7 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_expand "avx2_pmaddwd" - [(set (match_operand:V
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 15 Oct 08:46, Richard Henderson wrote: > On 10/09/2013 03:31 AM, Kirill Yukhin wrote: > > + rtx op1 = operands[1]; > > + if (REG_P (op1)) > > +op1 = gen_rtx_REG (V16HImode, REGNO (op1)); > > + else > > +op1 = gen_lowpart (V16HImode, op1); > > The IF case is incorrect. You need to use gen_lowpart always. I suspect gen_lowpart is bad turn when reload is completed, as far as it can create new pseudo. gen_lowpart () may call gen_reg_rtx (), which contain corresponging gcc_assert (). I've rewrote this pattern w/o explicit insn emit: (define_insn_and_split "vec_extract_lo_v32hi" [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,m") (vec_select:V16HI (match_operand:V32HI 1 "nonimmediate_operand" "vm,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1))] { if (REG_P (operands[1])) operands[1] = gen_rtx_REG (V16HImode, REGNO (operands[1])); else operands[1] = adjust_address (operands[1], V16HImode, 0); }) > The second alternative only matches for v/m/1. While I imagine that it > doesn't > really matter, it might be better to swap the two so that vmovddup gets used > for the v/v/1 case too. Done. > Why do you have separate define_expand and define_insn for this pattern? Expand removed. > And is this really the best description for this insn? It's a store to > memory. > Surely it's better to say that we store V8QI. I believe it is a bug. We store 8 bytes, no zeroes. Fixed. > We don't need to use ix86_fixup_binary_operands for any three-operand insn. > That function is in order to help SSE's two-operand insns. You can drop the > define-expand and just keep the define_insn. Fixed. > You want a "%v" for operand 1, to make the operands commutative. Done. > > +(define_insn "*vec_widen_smult_even_v16si" > > + [(set (match_operand:V8DI 0 "register_operand" "=x") > > +(mult:V8DI > > Similarly, plus errant "x". Done. Whole patch below. Is it ok now? Bootstrap pass, all AVX* tests pass. -- Thanks, K gcc/config/i386/i386.md | 5 + gcc/config/i386/predicates.md | 40 ++ gcc/config/i386/sse.md| 873 +- 3 files changed, 912 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 10ca6cb..e7e9f2d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -831,6 +831,11 @@ (define_code_attr s [(sign_extend "s") (zero_extend "u")]) (define_code_attr u_bool [(sign_extend "false") (zero_extend "true")]) +;; Used in signed and unsigned truncations. +(define_code_iterator any_truncate [ss_truncate truncate us_truncate]) +;; Instruction suffix for truncations. +(define_code_attr trunsuffix [(ss_truncate "s") (truncate "") (us_truncate "us")]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 06b2914..999d8ab 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -752,6 +752,11 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 6, 7)"))) +;; Match 8 to 9. +(define_predicate "const_8_to_9_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 8, 9)"))) + ;; Match 8 to 11. (define_predicate "const_8_to_11_operand" (and (match_code "const_int") @@ -762,16 +767,51 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 8, 15)"))) +;; Match 10 to 11. +(define_predicate "const_10_to_11_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 10, 11)"))) + +;; Match 12 to 13. +(define_predicate "const_12_to_13_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 12, 13)"))) + ;; Match 12 to 15. (define_predicate "const_12_to_15_operand" (and (match_code "const_int"
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 16 Oct 09:59, Richard Henderson wrote: > On 10/16/2013 09:07 AM, Kirill Yukhin wrote: > > I suspect gen_lowpart is bad turn when reload is completed, as > > far as it can create new pseudo. gen_lowpart () may call > > gen_reg_rtx (), which contain corresponging gcc_assert (). > > False. gen_lowpart is perfectly safe post-reload. > Indeed, taking the subreg of a hard register should arrive > > x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset); > > in simplify_subreg. > > Have you encountered some specific problem with gen_lowpart? Yes. Patch [8/8] contains testsuite for AVX-512. This pattern is covered as well. When trying to do so: (define_insn_and_split "vec_extract_lo_v32hi" [(set (match_operand:V16HI 0 "nonimmediate_operand" "=v,m") (vec_select:V16HI (match_operand:V32HI 1 "nonimmediate_operand" "vm,v") (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" "#" "&& reload_completed" [(const_int 0)] { rtx op1 = operands[1]; op1 = gen_lowpart (V16HImode, op1); emit_move_insn (operands[0], op1); DONE; }) I've got ICE, with following bt: #1 0x006f28d6 in gen_reg_rtx (mode=V32HImode) at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:866 #2 0x0070243a in copy_to_reg (x=(reg:V32HI 21 xmm0 [163])) at /export/users/kyukhin/gcc/git/gcc/gcc/explow.c\ :606 #3 0x0091dfb8 in gen_lowpart_general (mode=V16HImode, x=) at /export/users/kyukhin/gcc/git/gcc/gcc/rtlhooks.c:50 #4 0x00ce16e8 in gen_split_4943 (curr_insn=, operands=0x16f6320) at /export/users/kyukhin/gcc/git/gcc/gcc/config/i386/sse.md:6329 #5 0x006f7865 in try_split (pat=(set (reg:V16HI 23 xmm2 [164]) (vec_select:V16HI (reg:V32HI 21 xmm0 [163]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))), trial=(insn 48 46 49 6 (set (reg:V16HI 23 xmm2 [164]) (vec_select:V16HI (reg:V32HI 21 xmm0 [163]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))) /export/users/kyukhin/gcc/git/gcc/gcc/testsuite/gcc.target/i386/avx512f-vec-unpack.c:24 2151 {ve\ c_extract_lo_v32hi} (nil)), last=) at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:3467 So, we have: [rtlhooks.c:50]gen_lowpart_general () -> [explow.c:606]copy_to_reg () -> [emit-rtl.c:866]gen_reg_rtx (): gcc_assert (can_create_pseudo_p ()); Maybe the code in the pattern is buggy? Or is it a gen_lowpart? -- Thanks, K
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 17 Oct 13:14, Uros Bizjak wrote: > On Thu, Oct 17, 2013 at 12:47 PM, Kirill Yukhin > wrote: > > > >> > I suspect gen_lowpart is bad turn when reload is completed, as > >> > far as it can create new pseudo. gen_lowpart () may call > >> > gen_reg_rtx (), which contain corresponging gcc_assert (). > >> > >> False. gen_lowpart is perfectly safe post-reload. > >> Indeed, taking the subreg of a hard register should arrive > >> > >> x = gen_rtx_REG_offset (op, outermode, final_regno, > >> final_offset); > >> > >> in simplify_subreg. > >> > >> Have you encountered some specific problem with gen_lowpart? > > Maybe the code in the pattern is buggy? Or is it a gen_lowpart? > > I think that original approach with gen_rtx_REG is correct and follows > established practice in sse.md (please grep for gen_reg_RTX in > sse.md). If this approach is necessary due to the deficiency of > gen_lowpart, then the fix to gen_lowpart should be proposed in a > follow-up patch. So, I've reverted changes in mult_vect patterns and added "%" to constraints. I've also reverted vec_extract_* (with slight update): ... + "&& reload_completed" + [(set (match_dup 0) (match_dup 1))] +{ + if (REG_P (operands[1])) +operands[1] = gen_rtx_REG (V16HImode, REGNO (operands[1])); + else +operands[1] = adjust_address (operands[1], V16HImode, 0); +}) Bootastrapped. AVX* tests pass (including new AVX-512) Is it ok now? -- Thanks, K --- gcc/config/i386/i386.md | 5 + gcc/config/i386/predicates.md | 40 ++ gcc/config/i386/sse.md| 873 +- 3 files changed, 912 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 10ca6cb..e7e9f2d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -831,6 +831,11 @@ (define_code_attr s [(sign_extend "s") (zero_extend "u")]) (define_code_attr u_bool [(sign_extend "false") (zero_extend "true")]) +;; Used in signed and unsigned truncations. +(define_code_iterator any_truncate [ss_truncate truncate us_truncate]) +;; Instruction suffix for truncations. +(define_code_attr trunsuffix [(ss_truncate "s") (truncate "") (us_truncate "us")]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 06b2914..999d8ab 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -752,6 +752,11 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 6, 7)"))) +;; Match 8 to 9. +(define_predicate "const_8_to_9_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 8, 9)"))) + ;; Match 8 to 11. (define_predicate "const_8_to_11_operand" (and (match_code "const_int") @@ -762,16 +767,51 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 8, 15)"))) +;; Match 10 to 11. +(define_predicate "const_10_to_11_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 10, 11)"))) + +;; Match 12 to 13. +(define_predicate "const_12_to_13_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 12, 13)"))) + ;; Match 12 to 15. (define_predicate "const_12_to_15_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 12, 15)"))) +;; Match 14 to 15. +(define_predicate "const_14_to_15_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 14, 15)"))) + +;; Match 16 to 19. +(define_predicate "const_16_to_19_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 16, 19)"))) + ;; Match 16 to 31. (define_predicate "const_16_to_31_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 16, 31)"))) +;; Match 20 to 23. +(define_predicate "const_20_to_23_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 20, 23)"))) + +;; Match 24 to 27. +(define_predicate "const_24_to_27_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 24, 27)"))) + +;; Match 28 to 31. +(define_predicate "const_28_to_31_operand"
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 20 Oct 11:55, Uros Bizjak wrote: > Please also add back expanders with operand fixups and insn > constraints, as is the case with other commutative operators. They are > needed to hoist operand loads out of the loops (reload and later > passes won't hoist memory loads out of the loops when fixing up > operands). Whoops. I didn't know how git diff works for set of patches. Updated patch in the bottom. > The patch is OK with this change, but please wait for rths final approval. Richard, are you ok? Bootstrap pass. -- Thanks, K --- gcc/config/i386/i386.md | 5 + gcc/config/i386/predicates.md | 40 ++ gcc/config/i386/sse.md| 932 +- 3 files changed, 971 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 10ca6cb..e7e9f2d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -831,6 +831,11 @@ (define_code_attr s [(sign_extend "s") (zero_extend "u")]) (define_code_attr u_bool [(sign_extend "false") (zero_extend "true")]) +;; Used in signed and unsigned truncations. +(define_code_iterator any_truncate [ss_truncate truncate us_truncate]) +;; Instruction suffix for truncations. +(define_code_attr trunsuffix [(ss_truncate "s") (truncate "") (us_truncate "us")]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix "") (unsigned_fix "u")]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 06b2914..999d8ab 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -752,6 +752,11 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 6, 7)"))) +;; Match 8 to 9. +(define_predicate "const_8_to_9_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 8, 9)"))) + ;; Match 8 to 11. (define_predicate "const_8_to_11_operand" (and (match_code "const_int") @@ -762,16 +767,51 @@ (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 8, 15)"))) +;; Match 10 to 11. +(define_predicate "const_10_to_11_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 10, 11)"))) + +;; Match 12 to 13. +(define_predicate "const_12_to_13_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 12, 13)"))) + ;; Match 12 to 15. (define_predicate "const_12_to_15_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 12, 15)"))) +;; Match 14 to 15. +(define_predicate "const_14_to_15_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 14, 15)"))) + +;; Match 16 to 19. +(define_predicate "const_16_to_19_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 16, 19)"))) + ;; Match 16 to 31. (define_predicate "const_16_to_31_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 16, 31)"))) +;; Match 20 to 23. +(define_predicate "const_20_to_23_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 20, 23)"))) + +;; Match 24 to 27. +(define_predicate "const_24_to_27_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 24, 27)"))) + +;; Match 28 to 31. +(define_predicate "const_28_to_31_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 28, 31)"))) + ;; True if this is a constant appropriate for an increment or decrement. (define_predicate "incdec_operand" (match_code "const_int") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 623e919..c429855 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -87,6 +87,7 @@ ;; For AVX512F support UNSPEC_VPERMI2 UNSPEC_VPERMT2 + UNSPEC_UNSIGNED_FIX_NOTRUNC UNSPEC_UNSIGNED_PCMP UNSPEC_TESTM UNSPEC_TESTNM @@ -2994,6 +2995,34 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "DI")]) +(define_insn "cvtusi232" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (unsigned_float: + (match_operand:SI 2 "nonimmediate_operand" "rm"))) + (match_operand:VF_128 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512F" + "vcvtusi2\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseicvt") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "cvtusi264" + [(set (match_operand:VF_128 0 "register_operand" "=v") + (vec_merge:VF_128 + (vec_duplicate:VF_128 + (unsigned_float: + (match_operand:DI 2 "nonimmediate_operand" "rm"))) + (match_operand:VF_128 1 "register_operand" "v") + (const_int 1)))] + "TARGET_AVX512F && TARGET_64BIT" + "vcvtusi2\t{%2, %1, %0|%0, %1, %2}" + [(set_attr "type" "sseicvt") + (set_attr "prefix" "evex") + (set_attr "mode" "
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, Thanks for remarks, they all seems reasonable. One question On 21 Oct 16:01, Richard Henderson wrote: > > +(define_insn "avx512f_moves_mask" > > + [(set (match_operand:VF_128 0 "register_operand" "=v") > > + (vec_merge:VF_128 > > + (vec_merge:VF_128 > > + (match_operand:VF_128 2 "register_operand" "v") > > + (match_operand:VF_128 3 "vector_move_operand" "0C") > > + (match_operand: 4 "register_operand" "k")) > > + (match_operand:VF_128 1 "register_operand" "v") > > + (const_int 1)))] > > + "TARGET_AVX512F" > > + "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" > > + [(set_attr "type" "ssemov") > > + (set_attr "prefix" "evex") > > + (set_attr "mode" "")]) > > Nested vec_merge? That seems... odd to say the least. > How in the world does this get matched? This is generic approach for all scalar `masked' instructions. Reason is that we must save higher bits of vector (outer vec_merge) and apply single-bit mask (inner vec_merge). We may do it with unspecs though... But is it really better? What do you think? -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, On 22 Oct 08:16, Richard Henderson wrote: > On 10/22/2013 07:42 AM, Kirill Yukhin wrote: > > Hello Richard, > > Thanks for remarks, they all seems reasonable. > > > > One question > > > > On 21 Oct 16:01, Richard Henderson wrote: > >>> +(define_insn "avx512f_moves_mask" > >>> + [(set (match_operand:VF_128 0 "register_operand" "=v") > >>> + (vec_merge:VF_128 > >>> + (vec_merge:VF_128 > >>> + (match_operand:VF_128 2 "register_operand" "v") > >>> + (match_operand:VF_128 3 "vector_move_operand" "0C") > >>> + (match_operand: 4 "register_operand" "k")) > >>> + (match_operand:VF_128 1 "register_operand" "v") > >>> + (const_int 1)))] > >>> + "TARGET_AVX512F" > >>> + "vmov\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2}" > >>> + [(set_attr "type" "ssemov") > >>> + (set_attr "prefix" "evex") > >>> + (set_attr "mode" "")]) > >> > >> Nested vec_merge? That seems... odd to say the least. > >> How in the world does this get matched? > > > > This is generic approach for all scalar `masked' instructions. > > > > Reason is that we must save higher bits of vector (outer vec_merge) > > and apply single-bit mask (inner vec_merge). > > > > > > We may do it with unspecs though... But is it really better? > > > > What do you think? > > What I think is that while it's an instruction that exists in the ISA, > does that mean we must model it in the compiler? > > How would this pattern be used? When we have all-1 mask then simplifier may reduce such pattern to simpler form with single vec_merge. This will be impossible if we put unspec there. So, for example for thise code: __m128d foo (__m128d x, __m128d y) { return _mm_maskz_add_sd (-1, x, y); } With unspec we will have: foo: .LFB2328: movl$-1, %eax # 10*movqi_internal/2 [length = 5] kmovw %eax, %k1 # 24*movqi_internal/8 [length = 4] vaddsd %xmm1, %xmm0, %xmm0{%k1}{z} # 11sse2_vmaddv2df3_mask/2 [length = 6] ret # 27simple_return_internal [length = 1] While for `semantic' version it will be simplified to: foo: .LFB2329: vaddsd %xmm1, %xmm0, %xmm0 # 11sse2_vmaddv2df3/2 [length = 4] ret # 26simple_return_internal [length = 1] So, we have short VEX insn vs. long EVEX one + mask creation insns. That is why we want to expose semantics of such operations. Thanks, K
Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
Hello, This patch introduces mask scalar subst. Is it ok to commit to main trunk? Testing pass. -- Thanks, K --- gcc/config/i386/sse.md | 104 --- gcc/config/i386/subst.md | 23 +++ 2 files changed, 95 insertions(+), 32 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index bf0e1ed..1f0d6fa 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1307,7 +1307,7 @@ (set_attr "prefix" "") (set_attr "mode" "")]) -(define_insn "_vm3" +(define_insn "_vm3" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 (plusminus:VF_128 @@ -1318,10 +1318,10 @@ "TARGET_SSE" "@ \t{%2, %0|%0, %2} - v\t{%2, %1, %0|%0, %1, %2}" + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sseadd") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) (define_expand "mul3" @@ -1347,7 +1347,7 @@ (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) -(define_insn "_vm3" +(define_insn "_vm3" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 (multdiv:VF_128 @@ -1358,10 +1358,10 @@ "TARGET_SSE" "@ \t{%2, %0|%0, %2} - v\t{%2, %1, %0|%0, %1, %2}" + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sse") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "btver2_decode" "direct,double") (set_attr "mode" "")]) @@ -1446,7 +1446,7 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "*srcp14" +(define_insn "srcp14" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 @@ -1456,7 +1456,7 @@ (match_dup 1) (const_int 1)))] "TARGET_AVX512F" - "vrcp14\t{%2, %1, %0|%0, %1, %2}" + "vrcp14\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -1493,7 +1493,7 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) -(define_insn "_vmsqrt2" +(define_insn "_vmsqrt2" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 (sqrt:VF_128 @@ -1503,11 +1503,11 @@ "TARGET_SSE" "@ sqrt\t{%1, %0|%0, %1} - vsqrt\t{%1, %2, %0|%0, %2, %1}" + vsqrt\t{%1, %2, %0|%0, %2, %1}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sse") (set_attr "atom_sse_attr" "sqrt") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "btver2_sse_attr" "sqrt") (set_attr "mode" "")]) @@ -1542,7 +1542,7 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "*rsqrt14" +(define_insn "rsqrt14" [(set (match_operand:VF_128 0 "register_operand" "=v") (vec_merge:VF_128 (unspec:VF_128 @@ -1552,7 +1552,7 @@ (match_dup 1) (const_int 1)))] "TARGET_AVX512F" - "vrsqrt14\t{%2, %1, %0|%0, %1, %2}" + "vrsqrt14\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "sse") (set_attr "prefix" "evex") (set_attr "mode" "")]) @@ -1622,7 +1622,7 @@ (set_attr "prefix" "") (set_attr "mode" "")]) -(define_insn "_vm3" +(define_insn "_vm3" [(set (match_operand:VF_128 0 "register_operand" "=x,v") (vec_merge:VF_128 (smaxmin:VF_128 @@ -1633,11 +1633,11 @@ "TARGET_SSE" "@ \t{%2, %0|%0, %2} - v\t{%2, %1, %0|%0, %1, %2}" + v\t{%2, %1, %0|%0, %1, %2}" [(set_attr "isa" "noavx,avx") (set_attr "type" "sse") (set_attr "btver2_sse_attr" "maxmin") - (set_attr "prefix" "orig,vex") + (set_attr "prefix" "") (set_attr "mode" "")]) ;; These versions of the min/max patterns implement exactly the operations @@ -2748,7 +2748,7 @@ (match_operand:FMAMODE 3 "nonimmediate_operand")))] "") -(define_insn "*fma_fmadd_" +(define_insn "fma_fmadd_" [(set (match_operand:FMAMODE 0 "register_operand" "=v,v,v,x,x") (fma:FMAMODE (match_operand:FMAMODE 1 "nonimmediate_operand" "%0,0,v,x,x") @@ -2976,7 +2976,7 @@ UNSPEC_FMADDSUB))] "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F") -(define_insn "*fma_fmaddsub_" +(define_insn "fma_fmaddsub_" [(set (match_operand:VF 0 "register_operand" "=v,v,v,x,x") (unspec:VF [(match_operand:VF 1 "nonimmediate_operand" "%0,0,v,x,x") @@ -3241,6 +3241,46 @@ [(set_attr "type" "ssemuladd") (set_attr "mode" "")]) +(define_insn "*fmai_fmsub__mask" + [(set (match_operand:VF_128 0 "register_operand" "=v,v") + (vec_merge:VF_128 + (vec_merge:VF_128 + (fma:VF_128 + (match_operand:VF_128 1 "nonimmediate_operand" "0,0") + (match_operand:VF_128 2 "nonimmediate_operand" "vm,v") + (neg:VF_128 + (match_operand:VF_128 3 "nonimmediate_operand" "v,vm"))) + (match_dup 1) + (match_operand:QI 4 "register_
Re: [PATCH i386 4/8] [AVX512] Add substed patterns: mask_scalar_merge subst.
Hello, This patch introduces "mask_scalar_merge" subst. Is it ok to commit to main trunk? Testing pass. -- Thanks, K --- gcc/config/i386/sse.md | 26 +- gcc/config/i386/subst.md | 16 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 1f0d6fa..f3cca59 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2153,7 +2153,7 @@ [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand") (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")]) -(define_insn "avx512f_cmp3" +(define_insn "avx512f_cmp3" [(set (match_operand: 0 "register_operand" "=k") (unspec: [(match_operand:VI48F_512 1 "register_operand" "v") @@ -2161,13 +2161,13 @@ (match_operand:SI 3 "" "n")] UNSPEC_PCMP))] "TARGET_AVX512F" - "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssecmp") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_ucmp3" +(define_insn "avx512f_ucmp3" [(set (match_operand: 0 "register_operand" "=k") (unspec: [(match_operand:VI48_512 1 "register_operand" "v") @@ -2175,7 +2175,7 @@ (match_operand:SI 3 "const_0_to_7_operand" "n")] UNSPEC_UNSIGNED_PCMP))] "TARGET_AVX512F" - "vpcmpu\t{%3, %2, %1, %0|%0, %1, %2, %3}" + "vpcmpu\t{%3, %2, %1, %0|%0, %1, %2, %3}" [(set_attr "type" "ssecmp") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") @@ -8712,7 +8712,7 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_expand "avx512f_eq3" +(define_expand "avx512f_eq3" [(set (match_operand: 0 "register_operand") (unspec: [(match_operand:VI48_512 1 "register_operand") @@ -8721,14 +8721,14 @@ "TARGET_AVX512F" "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);") -(define_insn "avx512f_eq3_1" +(define_insn "avx512f_eq3_1" [(set (match_operand: 0 "register_operand" "=k") (unspec: [(match_operand:VI48_512 1 "register_operand" "%v") (match_operand:VI48_512 2 "nonimmediate_operand" "vm")] UNSPEC_MASKED_EQ))] "TARGET_AVX512F && ix86_binary_operator_ok (EQ, mode, operands)" - "vpcmpeq\t{%2, %1, %0|%0, %1, %2}" + "vpcmpeq\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "ssecmp") (set_attr "prefix_extra" "1") (set_attr "prefix" "evex") @@ -8808,13 +8808,13 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn "avx512f_gt3" +(define_insn "avx512f_gt3" [(set (match_operand: 0 "register_operand" "=k") (unspec: [(match_operand:VI48_512 1 "register_operand" "v") (match_operand:VI48_512 2 "nonimmediate_operand" "vm")] UNSPEC_MASKED_GT))] "TARGET_AVX512F" - "vpcmpgt\t{%2, %1, %0|%0, %1, %2}" + "vpcmpgt\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "ssecmp") (set_attr "prefix_extra" "1") (set_attr "prefix" "evex") @@ -9208,25 +9208,25 @@ ] (const_string "")))]) -(define_insn "avx512f_testm3" +(define_insn "avx512f_testm3" [(set (match_operand: 0 "register_operand" "=k") (unspec: [(match_operand:VI48_512 1 "register_operand" "v") (match_operand:VI48_512 2 "nonimmediate_operand" "vm")] UNSPEC_TESTM))] "TARGET_AVX512F" - "vptestm\t{%2, %1, %0|%0, %1, %2}" + "vptestm\t{%2, %1, %0|%0, %1, %2}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512f_testnm3" +(define_insn "avx512f_testnm3" [(set (match_operand: 0 "register_operand" "=k") (unspec: [(match_operand:VI48_512 1 "register_operand" "v") (match_operand:VI48_512 2 "nonimmediate_operand" "vm")] UNSPEC_TESTNM))] "TARGET_AVX512CD" - "%vptestnm\t{%2, %1, %0|%0, %1, %2}" + "%vptestnm\t{%2, %1, %0|%0, %1, %2}" [(set_attr "prefix" "evex") (set_attr "mode" "")]) diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 532a3a1..b537c5e 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -27,6 +27,9 @@ V16SF V8SF V4SF V8DF V4DF V2DF]) +(define_mode_iterator SUBST_S + [QI HI SI DI]) + (define_subst_attr "mask_name" "mask" "" "_mask") (define_subst_attr "mask_applied" "mask" "false" "true") (define_subst_attr "mask_operand2" "mask" "" "%{%3%}%N2") @@ -77,3 +80,16 @@ (match_operand: 5 "register_operand" "k")) (match_dup 2) (const_int 1)))]) + +(define_subst_attr "mask_scalar_merge_name" "mask_scalar_merge" "" "_mask") +(define_subst_attr "mask_scalar_merge_operand3" "mask_scalar_merge" "" "%{%3%}") +(define_subst_attr "mask_scalar_merge_operand4" "mask_scalar_merge" "" "%{%4%}") + +(define_subst "mask_scalar_merge" + [(set (match_operand:SUBST_S 0) +(match_operand:SUBST_S 1))] + "TARGET_AVX512F" + [(set (match_
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, On 28 Oct 08:20, Richard Henderson wrote: > Why is a masked *scalar* operation useful? The reason the instructions exist is so that you can do fully fault correct predicated scalar algorithms. I example. In fact, with some hacky tricks, you can fully predicate normal C code in the SIMD registers. One might want to have such region as fast as possible while staying scalar: (all vars are integers): if ( a[i] ) b[i] += c[i]; Definetely to have max performace we want to have the region fully predicated. This code cannot be predicated correctly in IA pre-AVX-512: vmovd a[i], %xmm0 vptestm %zmm0, %zmm0, %k1 // hack because we didn’t have masking for VPMOVD/Q vmovss b[i], %xmm0 {%k1}{z} // no scalar int add, hack, 128-bit works fine // because mask is sawed off in the right places vpaddd c[i], %zmm0, %zmm0 {%k1}{z} // vmpmovd/w hack again vmovss %xmm0, b[i] {%k1} So, having such masked scalar insns allows us to have non-branching scalar code. II Example. Perhaps one interesting case of scalar and mask, though not to do with predication (and really narrow), is as an idiom to generate a write mask value of 0x1: vcmpss k1, xmm0, xmm0, {sae}, 0xf Currently we do: mov 1, %eax kmovw %eax, %k1 But we sometimes have to spill a GPR in order to introduce this sequence. In that case the vcmpss idiom seems like a better choice (though you might worry about the additional false dependency on xmm0). And finally. It’s kind of strange not to have complete ISA support. What if someone would want to have equivalent code for the vector version and the scalar remainder version? -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello Richard, On 28 Oct 14:45, Richard Henderson wrote: > On 10/28/2013 01:58 PM, Kirill Yukhin wrote: > > Hello Richard, > > On 28 Oct 08:20, Richard Henderson wrote: > >> Why is a masked *scalar* operation useful? > > > > The reason the instructions exist is so that > > you can do fully fault correct predicated scalar algorithms. > > Using VEC_MERGE isn't the proper representation for that. > > If that's your real goal, then COND_EXEC is the only way to let > rtl know that faults are suppressed in the false condition. I believe cond_exec approach supposed to look like this: (define_subst "mask_scalar" [(set (match_operand:SUBST_V 0) (vec_merge:SUBST_V (match_operand:SUBST_V 1) (match_operand:SUBST_V 2) (const_int 1)))] "TARGET_AVX512F" [(cond_exec (eq:CC (match_operand: 3 "register_operand" "k") (const_int 1)) (set (match_dup 0) (vec_merge:SUBST_V (match_dup 1) (match_dup 2) (const_int 1]) But this only will describe merge-masking in incorrect way. We will need to add a clobber to signal that even for false condition we will zero higher part of register. Preferable zerro-masking will be indistinguishable from merge- masking and will need to choose which mask mode to enable. Bad turn. IMHO, we have 3 options to implement scalar masked insns: 1. `vec_merge' over vec_merge (current approach). Pro. 1. Precise semantic description 2. Unified approach with vector patterns 3. Freedom for simplifier to reduce EVEX to VEX for certain const masks Cons. 1. Too precise semantic description and as a consequence complicated code in md-file 2. `cond_exec' approach Pro. 1. Look useful for compiler when trying to generate predicated code Cons. 1. Not precise. Extra clobbers (?) needed: to signal that we're changing the register even for false condition in cond_exec 2. Unable to describe zero masking nicely 3. Code still complicated as for option #1 4. Simplifier won't work (clobber is always clobber) 3. Make all masked scalar insns to be unspecs Pro. 1. Straight-forward, not overweighted. Enough for intrinsics to work Cons. 1. Since every unspec needs a code: substs won't be applied directly: huge volume of similar code 2. Simplifier won't work 3. Generation of predicated code become hard Am I missing some options, or that’s all we have? If so, what option would you prefer? Thanks, K
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello, On 01 Nov 16:19, Kirill Yukhin wrote: > Coould you pls take a look? PING. -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
Hello, Small correction. On 01 Nov 16:19, Kirill Yukhin wrote: > +(define_insn "avx512f_store_mask" > + [(set (match_operand:VI48F_512 0 "memory_operand" "=m") > + (vec_merge:VI48F_512 > + (match_operand:VI48F_512 1 "register_operand" "v") > + (match_dup 0) > + (match_operand: 2 "register_operand" "k")))] > + "TARGET_AVX512F" > +{ > + switch (mode) Need to be MODE_. Same for load. Is it ok with that change? -- Thanks, K
[PATCH, i386, COMMITTED] Fix PR69118.
Hello, As proposed in PR69118 - fixed condition of compare pattern. Bootstrapped, regtested & comitted to main trunk & gcc-5-branch. gcc/ PR target/69118 * config/i386/sse.md (define_insn "avx512f_maskcmp3"): Fix target. -- Thanks, K commit 7fa978b9b80a6d50a81065755be81acc2923b0e2 Author: Kirill Yukhin Date: Wed Feb 3 12:37:13 2016 +0300 AVX512. Fix PR69118 - wrong target for compare pattern. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7f89679..045a85f 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2788,7 +2788,7 @@ (match_operator: 3 "sse_comparison_operator" [(match_operand:VF 1 "register_operand" "v") (match_operand:VF 2 "nonimmediate_operand" "vm")]))] - "TARGET_SSE" + "TARGET_AVX512F" "vcmp%D3\t{%2, %1, %0|%0, %1, %2}" [(set_attr "type" "ssecmp") (set_attr "length_immediate" "1")
Re: [PATCH, i386, AVX512] Adjust expected result for kunpackb intrinsic in avx512f-klogic-2 test.
Hi Sasha, On 04 Feb 17:59, Alexander Fomin wrote: > OK for trunk and 5-branch? Patch is OK for main trunk and release branches. (IMHO, pretty much obvious). -- Thanks, K
Re: [PATCH] Fix up avx512* regressions caused by the cse.c one-liner change (PR target/69671)
Hello Jakub, On 17 Feb 17:46, Jakub Jelinek wrote: > Hi! > > As I wrote in the PR, fwprop is able to forward CONST0_RTX back into > instructions even if CSE optimized them, but the problem in that case is > that for vector_move_operand "0C" operands if they appear inside of > (vec_select ... (parallel [(const_int 0) ... ])) the result is also > simplified, so one gets instead another CONST0_RTX (in the mode of > the VEC_SELECT). Because the patterns expect a vec_select and "C" operand > inside of it, it is therefore not matched, it maybe attached as REG_EQUAL > note. I went through other vector_move_operand "0C" and "0C,0" operands > and I don't think they suffer from similar problem, if fwprop or cprop etc. > attempts to propagate a constant into them, it shouldn't be possible it will > be simplified into something different. > > Anyway, the fix IMHO is to just duplicate the affected 8 define_insns > with the simplification applied. IMHO once we know it is {z}, it is worth > to keep it as {z}, there is no benefit to allow the RA to use "0" > operand instead. > > Bootstrapped/regtested on x86_64-linux and i686-linux, on both fixes > the testcases that started failing with r233133, ok for trunk? Patch is ok for trunk. Thanks a lot for fixing this! -- Regards, K
Re: [PATCH] Fix ICE in vcond expansion with -mavx512f -mno-avx512bw (PR target/69820)
Hi Jakub! On 15 Feb 22:00, Jakub Jelinek wrote: > Hi! > > We ICE on the following testcase, because vcondv32hiv32hi pattern > really needs avx512bw, but it is enabled for avx512f. > As VI_512 iterator is only used in vcond* patterns which need the > avx512bw ISA for the V64QI and V32HI modes, I've changed that iterator. > Or do you prefer to keep that iterator as is (so it will be unused) > and another one with these conditions? If yes, how should it be called. > > Bootstrapped/regtested on x86_64-linux and i686-linux. Patch is ok for trunk and branches. > 2016-02-15 Jakub Jelinek > > PR target/69820 > * config/i386/sse.md (VI_512): Only include V64QImode and V32HImode > if TARGET_AVX512BW. > > * gcc.target/i386/pr69820.c: New test. > > --- gcc/config/i386/sse.md.jj 2016-02-03 23:36:39.0 +0100 > +++ gcc/config/i386/sse.md2016-02-15 17:07:40.694352994 +0100 > @@ -522,7 +522,10 @@ (define_mode_iterator VI_128 [V16QI V8HI > (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) > > ;; All 512bit vector integer modes > -(define_mode_iterator VI_512 [V64QI V32HI V16SI V8DI]) > +(define_mode_iterator VI_512 > + [(V64QI "TARGET_AVX512BW") > + (V32HI "TARGET_AVX512BW") > + V16SI V8DI]) > > ;; Various 128bit vector integer mode combinations > (define_mode_iterator VI12_128 [V16QI V8HI]) > --- gcc/testsuite/gcc.target/i386/pr69820.c.jj2016-02-15 > 17:13:57.397220839 +0100 > +++ gcc/testsuite/gcc.target/i386/pr69820.c 2016-02-15 17:13:28.0 > +0100 > @@ -0,0 +1,14 @@ > +/* PR target/69820 */ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mavx512f -mno-avx512bw" } */ > + > +int a[100], b[100]; > +short c[100]; > + > +void > +foo () > +{ > + int i; > + for (i = 0; i < 100; ++i) > +b[i] = a[i] * (_Bool) c[i]; > +} > > Jakub -- Thanks, K
Re: [PATCH] Fix vec_set_hi* patterns (PR target/70059)
Hi Jakub, On 03 Mar 13:08, Jakub Jelinek wrote: > routine has changed and looks good to me). Can somebody test this please > on real hw or emulator? I'll run testing on the simulator. -- Thanks, K
Re: [PATCH] Fix vec_set_hi* patterns (PR target/70059)
On 03 Mar 21:17, Jakub Jelinek wrote: > On Thu, Mar 03, 2016 at 01:08:41PM +0100, Jakub Jelinek wrote: > > Fixed thusly, unfortunately I don't have access to avx512f (and not even to > > avx512dq) hw, so while I will bootstrap/regtest it on Haswell-E, can't test > > the tests if they now work at runtime (they link and the assembly of the foo > > routine has changed and looks good to me). Can somebody test this please > > on real hw or emulator? > > Ok for trunk if it passes? This is definetely copy-and-paste issue. OK for trunk and branches (although in 4_9 only 1 pattern affected). Thanks for catching this! > FYI, my bootstrap/regtest on Haswell-E (but without trying to run any > AVX512-* code, just link it at most) passed on both x86_64-linux and > i686-linux. Checked on skylake-avx512 simulator: $ ./*-ref/src/gcc/contrib/compare_tests *-ref/bld/ *-exp/bld # Comparing directories ## Dir1=31153-pr70059-ref/bld/: 3 sum files ## Dir2=15951-pr70059-exp/bld: 3 sum files # Comparing 3 common sum files ## /bin/sh ./31153-pr70059-ref/src/gcc/contrib/compare_tests /tmp/gxx-sum1.21498 /tmp/gxx-sum2.21498 New tests that PASS: gcc.target/i386/avx512dq-pr70059.c (test for excess errors) gcc.target/i386/avx512dq-pr70059.c execution test gcc.target/i386/avx512f-pr70059.c (test for excess errors) gcc.target/i386/avx512f-pr70059.c execution test # No differences found in 3 common sum files > > > 2016-03-03 Jakub Jelinek > > > > PR target/70059 > > * config/i386/sse.md (vec_set_lo_, > > _vinsert_mask): Formatting > > fixes. > > (vec_set_hi_): Likewise. Swap VEC_CONCAT operands. > > > > * gcc.target/i386/avx512f-pr70059.c: New test. > > * gcc.target/i386/avx512dq-pr70059.c: New test. > > Jakub -- K
Re: [PATCH, ia64] [PR target/52731] internal compiler error: in ia64_st_address_bypass_p, at config/ia64/ia64.c:9357
Hello, On 20 Nov 18:37, Kirill Yukhin wrote: > Hello, > Patch in the bottom fixes PR52731. > Is it ok for trunk? Ping? -- Thanks, K
Re: Ping Re: [gomp4] Dumping gimple for offload.
Hello Bernd, On 29 Nov 13:17, Bernd Schmidt wrote: > 5. There's a new DECL_TARGET which refers to this list of target > machines. It's set when creating a child function from e.g. "#pragma acc > parallel" Actually, I do not understand, what term `target machine' means here. Are you talking about to target toolchain (target compiler, assembler, linker, libraries etc)? > 6. ipa_write_summaries iterates over DECL_TARGET machines to write out > LTO for each of them. LTO sections for a different target get a separate > prefix encoding the machine name, e.g. ".gnu.tlto_nvptx_...". Why we want separate sections for different targets? As far as I understand this is going to be generic Gimple, which should be identical to PTX, MIC etc. We cannot use target built-ins inside such a common regions, right? I also think it worst saying that currently we're working on passing of omp_target sections to target compiler (we call it `streaming in') so we can produce target objects from lto sections containing IR marked to be `target'. Multiple targets are handled by means of dedicated targets descriptor, containing vector of target compilers which will be executed on given sections one-by-one producing set of objects for every target. This sections are not related on exact target, as I mentioned above. We're also working on generation of dedicated tables which will be needed for host<->target address mapping (see Jakub's mails on the subject). Hope to post initial versions nearest wws. -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [2/n] Add substed patterns: mask scalar subst.
Hello, On 19 Nov 12:05, Kirill Yukhin wrote: > Hello, > On 15 Nov 20:03, Kirill Yukhin wrote: > > Ping? > Ping? Ping? -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.
Hello, On 19 Nov 12:08, Kirill Yukhin wrote: > Hello, > On 15 Nov 20:06, Kirill Yukhin wrote: > > Ping. > Ping. Ping. -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.
Hello, On 19 Nov 12:12, Kirill Yukhin wrote: > Hello, > On 15 Nov 20:08, Kirill Yukhin wrote: > > > Is it ok for trunk? > > Ping. > Ping. Ping. -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.
Hello, On 19 Nov 12:11, Kirill Yukhin wrote: > Hello, > On 15 Nov 20:07, Kirill Yukhin wrote: > > > Is it ok for trunk? > > Ping. > Ping. Ping. -- Thanks, K
Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.
Hello, On 19 Nov 12:14, Kirill Yukhin wrote: > Hello, > On 15 Nov 20:09, Kirill Yukhin wrote: > > > Is it ok for trunk? > > Ping. > Ping. Ping. -- Thanks, K
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
Hello, On 19 Nov 12:14, Kirill Yukhin wrote: > Hello, > On 15 Nov 20:10, Kirill Yukhin wrote: > > > Is it ok to commit to main trunk? > > Ping. > Ping. Ping. -- Thanks, K
Re: [PATCH i386 6/8] [AVX-512] Add builtins/intrinsics.
Hello > Ok for trunk? Ping? -- Thanks, K