On Mon, Aug 18, 2014 at 12:29 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Mon, Aug 18, 2014 at 9:16 PM, H.J. Lu <hjl.to...@gmail.com> wrote: > >>> Attached patch fixes the problem with false data dependency on output >>> register for popcnt, lzcnt and tzcnt insns on sandybridge and haswell >>> targets. >>> >>> The new insn pattern shadows existing one, and after reload, the >>> clearing isns is split out of the insn. This way the clearing insn can >>> be scheduled by postreload scheduler. The new pattern takes care to >>> avoid live registers, so the compiler is always able to clear output >>> reg. >>> >>> The testcase from the PR, compiled with -O3 -march=corei7 improves on >>> Ivybridge from: >>> >>> unsigned 209717360000 3.21002 sec 16.3329 GB/s >>> uint64_t 209717360000 4.06517 sec 12.8971 GB/s >>> >>> to (-O3 -march=corei7 -mtune-ctrl=avoid_false_dep_for_bmi): >>> >>> unsigned 209717360000 3.14541 sec 16.6683 GB/s >>> uint64_t 209717360000 2.3663 sec 22.1564 GB/s >>> >>> Due to high impact, the new tune flag is enabled by default for Intel >>> tunes and generic: >>> >>> m_SANDYBRIDGE | m_HASWELL | m_INTEL | m_GENERIC >>> >>> 2014-08-16 Uros Bizjak <ubiz...@gmail.com> >>> >>> PR target/62011 >>> * config/i386/x86-tune.def (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): >>> New tune flag. >>> * config/i386/i386.h (TARGET_AVOID_FALSE_DEP_FOR_BMI): New define. >>> * config/i386/i386.md (unspec) <UNSPEC_INSN_FALSE_DEP>: New unspec. >>> (ffs<mode>2): Do not expand with tzcnt for >>> TARGET_AVOID_FALSE_DEP_FOR_BMI. >>> (ffssi2_no_cmove): Ditto. >>> (*tzcnt<mode>_1): Disable for TARGET_AVOID_FALSE_DEP_FOR_BMI. >>> (ctz<mode>2): New expander. >>> (*ctz<mode>2_falsedep_1): New insn_and_split pattern. >>> (*ctz<mode>2_falsedep): New insn. >>> (*ctz<mode>2): Rename from ctz<mode>2. >>> (clz<mode>2_lzcnt): New expander. >>> (*clz<mode>2_lzcnt_falsedep_1): New insn_and_split pattern. >>> (*clz<mode>2_lzcnt_falsedep): New insn. >>> (*clz<mode>2): Rename from ctz<mode>2. >>> (popcount<mode>2): New expander. >>> (*popcount<mode>2_falsedep_1): New insn_and_split pattern. >>> (*popcount<mode>2_falsedep): New insn. >>> (*popcount<mode>2): Rename from ctz<mode>2. >>> (*popcount<mode>2_cmp): Remove. >>> (*popcountsi2_cmp_zext): Ditto. >>> >>> The patch was bootstrapped and regression tested on >>> x86_64-pc-linux-gnu {,-m32} and will be committed to mainline SVN >>> after a couple of days. The patch will be also backported to 4.9 >>> branch. >>> >>> Uros. >> >> False dependency happens when destination is only updated by tcnt, >> lzcnt or popcnt. There is no false dependency when destination is >> also used in source. This patch avoids xor when destination is used > > That fact is a (good) news to me. > >> in source. The difference is >> >> @@ -91,15 +91,12 @@ main: >> .p2align 3 >> .L23: >> leal 1(%rdx), %ecx >> - xorl %r9d, %r9d >> - xorl %r10d, %r10d >> - popcntq (%rbx,%rax,8), %r10 >> - popcntq (%rbx,%rcx,8), %r9 >> + popcntq (%rbx,%rax,8), %rax >> leal 2(%rdx), %r8d >> - movq %r9, %rcx >> + popcntq (%rbx,%rcx,8), %rcx >> + addq %rax, %rcx >> xorl %eax, %eax >> leal 3(%rdx), %esi >> - addq %r10, %rcx >> popcntq (%rbx,%r8,8), %rax >> addq %rax, %rcx >> xorl %eax, %eax >> >> and I got >> >> unsigned 41959360000 0.456816 sec 22.954 GB/s >> uint64_t 41959360000 0.408019 sec 25.6992 GB/s >> >> vs >> >> unsigned 41959360000 0.531386 sec 19.7328 GB/s >> uint64_t 41959360000 0.408081 sec 25.6953 GB/s >> >> on Haswell. OK for trunk? >> 2014-08-18 H.J. Lu <hongjiu...@intel.com> >> >> * config/i386/i386.md (*ctz<mode>2_falsedep_1): Don't clear >> destination if it is used in source. >> (*clz<mode>2_lzcnt_falsedep_1): Likewise. >> (*popcount<mode>2_falsedep_1): Likewise. > > OK with a small nit below, if bootstrapped and regression tested > properly (you didn't state that). > > +; False dependency happens when destination is only updated by tcnt, > > tzcnt > > +; lzcnt or popcnt. There is no false dependency when destination is >
There is no regression on Haswell. This is what I checked in. Thanks. -- H.J.
2014-08-18 H.J. Lu <hongjiu...@intel.com> * config/i386/i386.md (*ctz<mode>2_falsedep_1): Don't clear destination if it is used in source. (*clz<mode>2_lzcnt_falsedep_1): Likewise. (*popcount<mode>2_falsedep_1): Likewise. diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 4749b74..8e74eab 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12269,8 +12269,11 @@ (match_operand:SWI248 1 "nonimmediate_operand"))) (clobber (reg:CC FLAGS_REG))])]) +; False dependency happens when destination is only updated by tzcnt, +; lzcnt or popcnt. There is no false dependency when destination is +; also used in source. (define_insn_and_split "*ctz<mode>2_falsedep_1" - [(set (match_operand:SWI48 0 "register_operand" "=&r") + [(set (match_operand:SWI48 0 "register_operand" "=r") (ctz:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm"))) (clobber (reg:CC FLAGS_REG))] @@ -12283,7 +12286,10 @@ (ctz:SWI48 (match_dup 1))) (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP) (clobber (reg:CC FLAGS_REG))])] - "ix86_expand_clear (operands[0]);") +{ + if (!reg_mentioned_p (operands[0], operands[1])) + ix86_expand_clear (operands[0]); +}) (define_insn "*ctz<mode>2_falsedep" [(set (match_operand:SWI48 0 "register_operand" "=r") @@ -12363,7 +12369,7 @@ "TARGET_LZCNT") (define_insn_and_split "*clz<mode>2_lzcnt_falsedep_1" - [(set (match_operand:SWI48 0 "register_operand" "=&r") + [(set (match_operand:SWI48 0 "register_operand" "=r") (clz:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm"))) (clobber (reg:CC FLAGS_REG))] @@ -12376,7 +12382,10 @@ (clz:SWI48 (match_dup 1))) (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP) (clobber (reg:CC FLAGS_REG))])] - "ix86_expand_clear (operands[0]);") +{ + if (!reg_mentioned_p (operands[0], operands[1])) + ix86_expand_clear (operands[0]); +}) (define_insn "*clz<mode>2_lzcnt_falsedep" [(set (match_operand:SWI48 0 "register_operand" "=r") @@ -12683,7 +12692,7 @@ "TARGET_POPCNT") (define_insn_and_split "*popcount<mode>2_falsedep_1" - [(set (match_operand:SWI48 0 "register_operand" "=&r") + [(set (match_operand:SWI48 0 "register_operand" "=r") (popcount:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm"))) (clobber (reg:CC FLAGS_REG))] @@ -12696,7 +12705,10 @@ (popcount:SWI48 (match_dup 1))) (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP) (clobber (reg:CC FLAGS_REG))])] - "ix86_expand_clear (operands[0]);") +{ + if (!reg_mentioned_p (operands[0], operands[1])) + ix86_expand_clear (operands[0]); +}) (define_insn "*popcount<mode>2_falsedep" [(set (match_operand:SWI48 0 "register_operand" "=r")