[Bug middle-end/118360] [avr] Expensive shift instead of bit test

law at gcc dot gnu.org via Gcc-bugs Sun, 07 Dec 2025 11:49:34 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118360


Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dbarboza at ventanamicro dot 
com

--- Comment #6 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Daniel.  This one might be another good one for you.

Focus on the second testcase, the one with the inverted test:

long fun_not1 (int a, long b)
{
    if (!(a & 1))
        b ^= 8;
    return b;
}

Which turns into this in the .optimized dump:

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = a_3(D) & 1;
  if (_1 == 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]
;;    succ:       3
;;                4

;;   basic block 3, loop depth 0
;;    pred:       2
  b_5 = b_4(D) ^ 8;
;;    succ:       4

;;   basic block 4, loop depth 0
;;    pred:       2
;;                3
  # b_2 = PHI <b_4(D)(2), b_5(3)>
  return b_2;


Seems like another case where phiopt should have turned this into branchless
code.  If we look at the original test we get this:

  _1 = a_3(D) & 1;
  _7 = _1 * 8;
  _8 = b_4(D) ^ _7;


I think we can get where we want to go by realizing that if we flip the low bit
of _1 we're good for the fun_not1 test.  So something like;

  _1 = a_3(D) & 1;
  _temp = _1 ^ 1;
  _7 = _temp * 8;
  _8 = b_4(D) ^ _7;

Note this might cause the avr to go backwards.  So we need to check that
carefully.  But the form above should be better than the branchy sequence we're
currently getting on risc-v:

fun_not1:
        andi    a5,a0,1 # 9     [c=4 l=4]  *anddi3/1
        mv      a0,a1   # 3     [c=4 l=4]  *movdi_64bit/0
        bne     a5,zero,.L2     # 10    [c=16 l=4]  *branchdi
        xori    a0,a1,8 # 12    [c=4 l=4]  *xordi3/1
.L2:
        ret             # 44    [c=0 l=4]  simple_return

I think the optimized code will look something like:

        andi    a0,a0,1 # 9     [c=4 l=4]  *anddi3/1
        xor     a0,a0,1
        slli    a0,a0,3 # 10    [c=4 l=4]  ashldi3
        xor     a0,a0,a1        # 16    [c=4 l=4]  *xordi3/0
        ret             # 25    [c=0 l=4]  simple_return


For AVR a sequence using shifts is bad, we may need to expand on Georg-Johann's
patch to convert it back to bit testing and such.

[Bug middle-end/118360] [avr] Expensive shift instead of bit test

Reply via email to