https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118342

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> Yes, so we can use it for a == 0 ? prec : __builtin_ctzll (a); but not say
> (with small middle-end enhancements) for a == 0 ? -1 : __builtin_ctzll (a);
> because on some CPUs that would yield -1LL and on others 0xffffffffULL.

As I read the footnote from the Comment #4, the problem would be with:

long long foo (int a)
{
  return a ? __builtin_ctz (a) : -1ll;
}

We declare:

(define_insn_and_split "ctz<mode>2"
  [(set (match_operand:SWI48 0 "register_operand" "=r")
        (ctz:SWI48
          (match_operand:SWI48 1 "nonimmediate_operand" "rm")))
   (clobber (reg:CC FLAGS_REG))]

without strict_low_part, so for ctzsi2, there is no guarantee that bits outside
SImode low part register will be preserved.

OTOH, we also declare:

(define_insn_and_split "*ctzsidi2_<s>ext"
  [(set (match_operand:DI 0 "register_operand" "=r")
        (any_extend:DI
          (ctz:SI
            (match_operand:SI 1 "nonimmediate_operand" "rm"))))
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_64BIT"
{
  if (TARGET_BMI)
    return "tzcnt{l}\t{%1, %k0|%k0, %1}";
  else if (TARGET_CPU_P (GENERIC)
           && !optimize_function_for_size_p (cfun))
    /* tzcnt expands to 'rep bsf' and we can use it even if !TARGET_BMI.  */
    return "rep%; bsf{l}\t{%1, %k0|%k0, %1}";
  return "bsf{l}\t{%1, %k0|%k0, %1}";
}

which *may* clear upper 32 bits with input operand == 0. So,

        movq    $-1, %rax
        rep bsfl        %edi, %eax
        ret

would be risky, because bsfl *may* clobber the highpart of %rax when %edi is 0.

Reply via email to