> Regarding the accumulation point, if we perform fusion and add the bias in > `int32` in the accumulator at the end, is it any different than preloading > the accumulator?
When preloading a negative bias, a signed 32 bit accumulator positive accumulate range is extended (before overflow), for example. Maybe the result from a post bias_add is the same for most implementations, but signed int overflow behavior is undefined in the C standards... so the order of bias_add operations might matter. I saw the bias preload used in some paper. I'll check my notes and see if I can find it. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3591#issuecomment-514302764