------- Comment #4 from gunnar at greyhound-data dot com 2008-06-04 09:29 ------- I want to add that this wrong behavior is partly related to the compile option "-Os".
There are two causes where GCC generates unneeded TST instructions. A) General arithmetic lsr.l #1,D0 tst.l d0 jbne ... This tst instruction is unneeded as the LSR is setting the flags correctly already. B) subq.l #1,D1 tst.l d1 jbne ... This unneeded TST is related to the compile option used. If you compile the source with "-O2" then the second unneeded TST instructions are not included in the source. It seems to me that a general important optimizations step - which used to be in "Os" in GCC 2.9 was removed from "Os" causing GCC to generate worse code now. Can you please be so kind and correct this? I believe that this issue is quite serious for the performance of the generated code. 1st The unneeded TST instructions are increasing code size, which is important in embedded environments. 2nd There are case were the instruction which really did set the condition codes correctly in the first place is far enough away from the conditional branch and no CC trashing instruction in between them - so that the instruction fetcher can 100% correctly predict the branch and fold it away completely. The unneeded TST instruction makes branch folding impossible and requires the CPU to guess the branch instead. This will cause a serious performance impact in case of mispredicting the branch. It should be clear that the unneeded TST instruction doas not only bloat the code but the above mentioned conditions can serious degrade the performance as well, depending on your used CPU of course. In the light of this, wouldn't it might sense to increase the Severity of this issue? Regards Gunnar -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36133