https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78794
--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Uroš Bizjak from comment #7) > Yes, this is a good idea. Also, since pandn on non-BMI target replaces four arith insns with one, the gain should be raised for 2 * ix86_cost->add for a total of 3 * ix86_cost->add. The final patch is thus: --cut here-- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1cd1cd8..6a746b2 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3417,7 +3417,11 @@ dimode_scalar_chain::compute_convert_gain () || GET_CODE (src) == AND) { gain += ix86_cost->add; - if (CONST_INT_P (XEXP (src, 0))) + /* Additional gain for andnot for targets without BMI. */ + if (GET_CODE (XEXP (src, 0)) == NOT + && !TARGET_BMI) + gain += 2 * ix86_cost->add; + else if (CONST_INT_P (XEXP (src, 0))) gain -= vector_const_cost (XEXP (src, 0)); if (CONST_INT_P (XEXP (src, 1))) gain -= vector_const_cost (XEXP (src, 1)); --cut here- Please also note that on BMI targets, the attached testcase won't be converted, which is a good thing - the loop on BMI targets looks like: .L4: movl 4(%eax), %edi andn 4(%esp), %edi, %ebx movl (%eax), %esi movl %ebx, %ebp andn (%esp), %esi, %ecx orl %ecx, %ebp jne .L3 xorl 8(%esp), %esi xorl 12(%esp), %edi movl %esi, (%eax) movl %edi, 4(%eax) .L3: addl $12, %eax cmpl %edx, %eax jne .L4